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Introduction 


The modern axiomatics of probability theory was created by Andrei Niko- 
laevich Kolmogorov in 1933. This axiomatics is based on the measure- 
theoretical approach to probability. The main advantage of the Kolmogorov 
probabilistic formalism is the high level of abstractness. The use of abstract 
probability spaces gives the possibility to develop general probabilistic calcu- 
lus in that structures of concrete spaces of elementary events do not play any 
role. The Kolmogorov probabilistic formalism was successfully developed in 
many direction. This formalism is now the mathematical basis for numer- 
ous probabilistic models in physics, technique, biology, finance. Everything 
is perfect in the landscape of Kolmogorov probability theory except for one 
dark cloud obscuring the horizon. 

This cloud is the probability foundation of quantum mechanics. This 
cloud was generated by A. Einstein, B. Podolsky and N. Rosen (EPR) in 
1935 (just two years after creation of the axiomatics of probability theory). 
EPR started the discussion on completeness of quantum mechanics (the pos- 
sibility to describe whole physical reality by the formalism of quantum me- 
chanics). In fact, the problem of completeness has the close connection with 
foundations of probability theory. EPR proposed some arguments that can 
be interpreted as the evidence of incompleteness of quantum mechanics. The 
following discussion on the EPR arguments demonstrated that quantum me- 
chanics has quite marshy foundations. Often the EPR arguments are even 
considered as the paradox in foundations of quantum mechanics. During 
following thirty years dark quantum cloud obscuring the landscape of the 
Kolmogorov probability theory was rather small. The probabilistic roots of 
the EPR paradox were not so evident. Nobody tried to connect the paradox 
in foundations of quantum mechanics with foundations of probability theory. 
The first attempt to provide the probabilistic representation of the EPR con- 
siderations was done by J. Bell who found in 1964 famous Bell’s inequality 
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for covariations of physical observables involved in the EPR experiment. The 
black quantum cloud became quite large. Even in 1964 it is not only blotted 
out the sun of the Kolmogorov landscape, but it was gathering to obscure the 
beautiful idea of unique and general probability theory. However, nothing 
occurred in 1964. Moreover, nothing occurred in the following thirty years. 
And it seems to be that nothing is gathering to occur with unique and general 
Kolmogorov probability theory. 

The great Kolmogorov probability community is still working in the stan- 
dard measure-theoretical formalism. They do not pay attention to quantum 
clouds. On the other hand, the majority of the physical community observes 
this cloud. However, physicists do not understand the hidden probabilistic 
structure of this cloud. Some of them support the idea of the death of reality. 
They think that it is impossible to use realism in quantum considerations. 
And if there is no reality at all, they do not afraid this non-real cloud. Other 
physicists support the idea of nonlocality. They think that physical reality is 
nonlocal. Thus by doing some measurement for a quantum system in Moscow 
we change the quantum state of the correlated quantum system which is lo- 
cated in Vladivostok. The adherents of nonlocality also do not observe the 
black cloud: this cloud is distributed everywhere and, hence, such a cloud 
could not induce storm. 

There are many reasons for this strange situation. One of them is purely 
psychological. Mathematicians are not interested in quantum physics (mainly 
because they still do not know quantum theory). Physicists are not inter- 
ested in foundations of probability theory (mainly because they know not so 
much even about the standard Kolmogorov measure-theoretical approach). 
In principle, even J. Bell in 1964 could pay attention that Bell’s inequality 
is connected not only with such properties of physical observables as realism 
and locality, but also with the way of the probabilistic description. How- 
ever, this was not donet. Bell’s inequality was not considered as a sign for 
reconsideration of the foundations of probability theory. In the opposite to 
geometry probability theory was not transformed in an elastic formalism con- 
taining numerous probabilistic models which can be used for descriptions of 
different physical phenomena. Probability theory is still a rigid structure. 
This structure can be compared with the rigid Euclidean cub. Attempts to 
use the unique Kolmogorov model for describing all physical phenomena can 


1I had numerous discussions with scientists worked with J. Bell. Unfortunetely the 
general opinion is that J. Bell had never been interested in probability theory. 
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be compared with attempts to represent all geometrical models by Euclidean 
cubs. However, geometric reality is not restricted to reality of cubs as well 
as probabilistic reality is not restricted to reality of Kolmogorov probability 
spaces. 

In this book we demonstrate that ‘pathological behaviour’ of ‘quantum 
probabilities is a consequence of the use of Kolmogorov’s approach. The 
high level of abstractness does not give the possibility to control connection 
between probabilities and statistical ensembles or random sequences (collec- 
tives). Formal manipulations with abstract Kolmogorov probabilities pro- 
duce such monsters as Bell’s inequality (in fact, this idea was already dis- 
cussed by L. de Broglie and later improved by G. Lochak). 

First attempts to reduce the EPR paradox to the use of one concrete prob- 
ability model, namely, the Kolmogorov model, were the works of L. Accardi 
and I. Pitowsky. L. Accardy proposed rather formal non-Kolmogorovean 
model which did not contain Bayes’ formula. I. Pitowsky proposed to con- 
sider events which are described by nonmeasurable sets. The latter formalism 
was strongly improved by S. Gudder who developed the theory of probabil- 
ity manifolds. The main disadvantage of all these models is that they have 
even higher level of abstractness than Kolmogorov’s model. Therefore they 
provide merely a new description of quantum phenomena. They could not 
explain probabilistic roots of quantum behaviour. The same can be said 
about so called quantum probabilities. Of course, quantum probability cal- 
culus gives the useful and convenient description of quantum phenomena. 
However, quantum probability has no direct connection with probability. 
This is just rather speculative use of the word ‘probability’ in some formal 
mathematical constructions. 

In this book we explain quantum probabilistic behaviour by using two ba- 
sic interpretations of probability: ensemble and frequency. We demonstrate 
that (despite of the common opinion) the ensemble and frequency probability 
models are not in general equivalent to Kolmogorov’s model. 

The frequency model is von Mises’ theory of collectives (random se- 
quences), 1919. In this model probabilities are defined as limits of relative 
frequencies, yy = n/N, in a collective x. This model was intensively studied 
in probability theory to find a reasonable definition of a random sequence 
(Kolmogorov algorithmic complexity and Martin-Lof theory of tests for ran- 
domness). It is the common opinion that frequency probabilities can be re- 
duced to Kolmogorov probabilities. We explain (using the original arguments 
of R. von Mises) that such a viewpoint is totally wrong. In particular, the 
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law of large numbers does not describe the statistical stabilization of relative 
frequencies, vy = n/N, in a collective z. It will be shown that the frequency 
probability model has many features which extremely differ from features of 
Kolmogorov’s model. The most important feature of frequency probability 
is dependence of probabilities on a collectives x. In particular, the careful 
control of such a dependence gives the possibility to eliminate Bell’s inequal- 
ity from considerations. The frequency model also differs from Kolmogorov’s 
model in the approach to conditional probabilities and independence. Such 
a difference also plays the important role in quantum considerations. 

The ensemble model, as well as the frequency model, is one of basic ‘pre- 
Kolmogorov’ models. For example, the well known Bernoulli theorem is , in 
fact, a theorem for ensemble probabilities. It is commonly supposed that the 
definition of ensemble probability (as a proportion in an ensemble) is just 
a particular case of Kolmogorov’s measure-theoretical definition. It is not 
right. The ensemble probability model cannot be reduced to Kolmogorov’s 
one. The most important feature of ensemble probabilities is dependence on 
an ensemble. In particular,the careful control of such a dependence gives the 
possibility to eliminate Bell’s inequality from considerations. It is impossi- 
ble to use Kolmogorov’s measure-theoretical approach for describing ensem- 
ble probabilities for infinite ensembles. In fact, Kolmogorov’s measures are 
not proportional distributions of properties of ensembles of physical systems. 
These are measures on ensembles of all possible sequences of results of mea- 
surements. To obtain the adequate mathematical description of ensemble 
(proportional) probabilities for infinite ensembles (and quantum states de- 
scribe such ideal ensembles), we have to leave the domain of Kolmogorov’s 
probability model and, moreover, the domain of real analysis. We have to 
use number systems which contain actual infinities. In this book we use 
systems of so called p-adic numbers Q, (where p > 1 are prime numbers) 
for the description of some infinite statistical ensembles (Q, contains actual 
infinities). 

The origin of p-adic ensemble probabilities can be illustrated by the fol- 
lowing example. Let S be an infinite ensemble of balls. Each ball has some 
colour c € C = {0,1,2,....,k,...} (countable system of colours). The S has 
the following colour structure: there are ng = 2* balls with the colour k € C 
in S. The ‘volume’ N = |S| of S can be easily found: 
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Of course, this series diverges in the field of real numbers R. But it converges 
in the field of 2-adic numbers Q2. The sum of this series Q, can be found by 
using ordinary formula for the sum of infinite geometric progression (because 
2* — 0,k — œ, in Q2): 


We can now find the proportion of balls with the colour k € C in the ensemble 
S: 
n 
P5(k) = x = 2, 


We remark that, as nę = 2" is a finite number and N = —1 is an infinite 
number, the probability Ps(k) is infinitely small probability. And such a 
probability is represented by a negative number. This approach induces the 
rigorous mathematical theory of negative probabilities. 

In fact, negative probabilities is other cloud obscuring the Kolmogorov 
probability landscape. Negative probabilities (which could not be justified by 
Kolmogorov’s model) arise with the strange regularity in practically all quan- 
tum models. The most famous are Wiegner distribution on the phase space 
and Dirac’s negative probability distributions in the formalism of relativis- 
tic quantization. Such ‘probability distributions’ are considered as monsters 
of quantum theory. For example, physicists always underline that Wiegner 
distribution is not really a probability distribution. At the same time they 
continue to use it for describing probabilistic phenomena. In this book neg- 
ative probabilities (in particular, the Wiegner distribution) are realized as 
probabilities with respect to infinitely large statistical ensembles. In many 
physical models these probability have the interpretation of infinitely small 
probabilities. Negative probabilities can also be obtained in the frequency 
approach as limits of relative frequencies, yy = n/N, with respect to some 
topology on the set rational numbers Q which differs from the standard real 
topology (and frequencies vy = n/N always belong to Q). For example, in 
the p-adic topology the probability P = —1 can be obtained as the limit of 
frequencies: 

P = —1 = limvy. 


Typically in the frequency approach the presence of negative probabilities is 
the exhibition of the violation of the principle of the statistical stabilization 
for relative frequencies with respect to the real topology. Negative frequency 
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probabilities can also appear via, for example, p-adic splitting of conventional 
probability P = 0. In the latter case negative probabilities can be again 
interpreted as infinitely small probabilities. 

In a part of book we investigate connection between information and 
probability. A model of purely information physical reality is developed: 
here basic objects are information objects (so called transformers of infor- 
mation), physical processes are purely information processes and statistics 
is information statistics. We investigate models of classical and quantum 
mechanics (in particular, the pilot wave theory) on information spaces. Such 
information mechanics can be used for describing cognitive, social, psycho- 
logical and even anomalous phenomena. Subjective probability is considered 
from the information viewpoint (as the ensemble probability for information 
ensembles; in particular, ensembles of human ideas). Such an interpretation 
of subjective probabilities is used for studying some psychological phenom- 
ena. We try to explain Freud’s psychoanalysis by considering ensembles of 
conscious and unconscious ideas and roles of these two classes of ideas in 
Bayes’ formula for conditional probabilities. 

The last chapter of the book has purely mathematical character. Here we 
develop a p-adic analogue of the Martin-L6f theory of tests for randomness (to 
find a p-adic analogue of a random sequence). p-adic theory strongly differs 
from the real one. There is no universal test for randomness (at least for 
the uniform p-adic probability distribution). In this sense the p-adic theory 
of randomness is similar to Schnorr’s theory for Kolmogorov probabilities. 
We also obtained a large class of limit theorems for p-adic probabilities. In 
particular, these limit theorems can be applied to negative probabilities. We 
remark that the first limit theorem for negative probabilities was proved by 
M. Barnett in 1944. 


Main consequences of the book: 

1. Kolmogorov’s probability theory (measure-theoretical approach) is just 
one of many probability model. 

2. Two fundamental interpretations of probability, namely, the ensemble 
and frequency interpretations, can be used as the basis for numerous non- 
Kolmogorovean models. 

3. Negative probabilities are well defined on the mathematical level of 
rigorousness. 

4. Pathological (or nonclassical) behaviour of ‘quantum probabilities’ 
(and, in particular, Bell’s inequality) is a consequence of the formal use of 
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Kolmogorov’s probability model. 

5. Bell’s inequality could not be used as an argument for nonlocality or 
nonreality. In may be that physical reality nonlocal or nonobjective. How- 
ever, Bell’s inequality has nothing to do with these problems. 

6. The Wiegner distribution is well defined both in the ensemble and 
frequency frameworks. 

7. From the frequency viewpoint non-Kolmogorovean probabilistic be- 
haviour is (typically) the exhibition of the violation of the law of large num- 
bers. 

8. From the ensemble viewpoint non-Kolmogorovean probabilistic be- 
haviour is a consequence of the use of ensembles of infinitely large ‘volume’. 
There is no statistical reproducibility of properties for finite approximations 
of these infinite ensembles. 

9. Quantum states (wave functions) describe such infinite (ideal) ensem- 
bles with statistical nonreproducibility of properties. 


A large number of mathematicians and physicists took part in the dis- 
cussion of results exposed in this book. I want to use the opportunity to 
express my deepest gratitude to all of them. I feel myself especially indebted 
to L. Accardi, S. Albeverio, H. Atmanspacher, Z. Hradil, W. de Muynck, H. 
Rauch, J. Summhammer for fruitful discussions. 


Vaxjo-Clermont-Ferrand—Tokyo, 1998-99. 
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Foundations of Probability 
Theory 


There is no ‘general probability theory’. There exist an incredible number of 
different mathematical probabilistic formalisms [74]-[78}, [15], [16], [40], [23], 
[39], [50], [86]—[88], [1], [42], [92], [60], and, moreover, each of these formalisms 
has a few different interpretations. We shall discuss some of these theories 
which will be useful in further physical considerations. 


1 A few words about measures 


We recall some notions of measure theory. A system F of subsets of a set Q 
is called an algebra if the sets Ø, Q belong to F and the union, intersection 
and difference of two sets of F also belong to F. In particular, for any A € F, 
a complement A = \ A of A belongs to F. Denote by Fp the family of all 
subsets of 2. This is the simplest example of an algebra. 

Let F be an algebra. A map wp: F — R, is said to be a measure if 
(AU B) = u(A) + (B) for A,B € F,AN B = 0. A measure p is called 
o-additive if, for every sequence {A,}°°, of sets A, € F such that their 
union A = UK; A, belongs to F, u(A) = Xp; w(An). 

An algebra F is said to be a o-algebra if, for every sequence {An}; of 
sets A, E F, their union A = UX A, belongs to F. 

Let 2), Q2 be arbitrary sets and let G1, G2 be some systems of subsets 
of Qı and Qe, respectively. A map £ : Qu — Qz is called measurable (or 
more precisely ((01, G1), (Q2, G2))-measurable) if, for any set A € Go, the 
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set €-1(A) € Gi. We shall use the notation £ : (Q), G1) — (Q2, G2) to indicate 
the dependence on G1, Gz. Typically we shall consider measurability of maps 
in the case in that Gj, j = 1,2, are algebras or o-algebras. 

Let A be a set. A characteristic function I, of the set A is defined as 
I,(x) = 1,2 € A, and I4(x) = 0,2 € A. 

Let A = {aj,...,@,} be a finite set. We shall denote the cardinality n of 
A by the symbol |A]. 


2 Classical and ensemble definitions of prob- 
ability 


1. Classical definition of probability. The theory of probability origi- 
nated from the study of problems connected with ordinary games of chance. 
In all these games the results that are a priori possible may be arranged in a 
finite number of cases assumed to be perfectly symmetrical, such as the cases 
represented by the six sides of a dice, the 52 cards in an ordinary pack of 
cards, and so on. This fact seemed to provide a basis for a rational explana- 
tion of the observed stability of statistical frequencies, and the 18th century 
mathematicians were thus led to the introduction of the famous principle of 
equally possible cases. According to this principle, a division into equally 
possible cases is possible in all random experiments, and the probability of 
an event is defined as the ratio between the number of cases favorable to 
the event, and the total number of possible cases. The main disadvantage 
of this probability theory is that the idea of symmetry cannot be applied to 
all random phenomena. For example, the classical definition of probability 
describes only a symmetric coin or dice. This definition cannot be used in the 
case of a violation of symmetry (see von Mises [88] for an extended critique 
of the classical definition). Denote by C the set of all possible cases. The 
classical theory operated on finite sets C = {q,...,cw}. For example, if a 
dice is considered, then C = {1,...,6}. Let E belong to the algebra Fo of all 
subsets of the set C. Then classical probability is defined by the equality 


P(E) = |E\/ICl. (2.1) 


The map P : Fo > Tg C Ri, where To = {x = k/N : k =0,1,...,N},N = 
|C], is a measure and P(C) = 1. This measure is uniform: P({c}) = 1/N 
and P(E) = 5 Dee L- 
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We could not use (2.1) for infinite sets C in the framework of real analysis 
(there are no actual infinities in R). This problem seems to be solved on 
the basis of the Kolmogorov measure-theoretic approach. But the classical 
definition (2.1) is not preserved in that approach. There are other possibilities 
to extend the classical definition of probability to infinite sets C. In principle 
we need not identify the set Tc of values of the classical probability with a 
subset of the set R of real numbers. It can be considered as just a subset of 
the set Q of rational numbers. It would be possible to extend the classical 
definition of probability by identifying Tç with a subset of other number 
system X such that Q C X, see Chapter 4. 

2. Ensemble (proportional) definition of probability. We start 
with the following classical example. There is an urn which contains balls of 
two colours, black and white. Let MN, and N,, be respectively the numbers 
of black and white balls; N = M, + N» is the total number of balls in the 
urn. By definition a probability is the coefficient of the proportion between 
the number of balls of the concrete colour and the total number of balls: 
P(b) = % and P(w) = Au, In the general case we have a finite set S (an 
ensemble). Elements s of S have some properties. Denote the set of these 
properties by mg. Each property £ € Ts can be described as a map £ : S — Kg, 
where Ke = {1,2,...,ke} is a finite set (a numerical cod of the property €). 
We set S(E = j) = {s € S : €(s) = j}; denote by F(z) the collection of all 
these sets. By definition these are events and their probability is defined by 


P(S(€ = j)) = OT (2.2) 


If we assume that F(z) is an algebra of sets then the map P : F'(as) > 
Ts C R,, where Ts = {x = k/N : k = 0,1,...,N} and N = |S], is a 
measure and P(S) = 1. If all one point sets s belong to the algebra F(z), 
then F(7rs) is the algebra of all subsets of S (i.e., F(s) = Fs) and P is 
the uniform distribution: P({s}) = 1/N. In this case we can connect the 
ensemble (proportional) definition with the classical definition: the elements 
of the ensemble S can be interpreted as equally possible cases. 

The conditional probabilities will play an essential role in further quantum 
considerations. Now we demonstrate how these probabilities are introduced 
in the ensemble approach. Let B = S(€ = 1), A = S(n = k),€,n E ag. Let 
the set C= AN B € F (ag). This means that there exists a property 0 € 7g 
such that C = S(@ = m). Conditional probability of the event B under the 
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condition A is defined as 
Ps(B/A) = Pa(B) = |B A Al/|Al 


(we must extract from the ensemble S the sub-ensemble A and find the 
proportion of elements s € A which has the property (s) = 1). Thus we can 
easily obtain that 


Ps(B/A) = Ps(BN A)/Ps(A), Ps(A) > 0. (2.3) 


This is well known Bayes’ formula. We note that in the ensemble framework 
it is a theorem. In standard textbooks the ensemble index is omitted: 


P(B/A) = P(BN A)/P(A), P(A) > 0. (2.4) 


Remark 2.1. If F (zg) is not an algebra, then A, B € F (7g) need not imply 
that C = ANB € F(zsg). In this case we could not use Bayes’ formula (2.4). 
Moreover, in such a case it is insensible to speak about conditional probabilities. 
There is no property 0 of elements s of S such that C = S(6 = m). Thus the set 
C = {s ES: (s) =1}N{s € S: = k} cannot be described by properties of 
S. From the physical viewpoint it means that we could not verify two properties € 
and 7 simultaneously. If we try to extract the sub-ensemble A from S by verifying 
the property 7, then we change the property € of s € S. 

As a simple consequence of (2.4) we obtain another important formula: 


P(AN B) = P(B/A)P(A). (2.5) 
By symmetry we find 

P(AN B) = P(A/B)P(B). (2.6) 
Thus we have: 

P(a/s) = SOS (2.7) 


To be more careful, we have to indicate the dependence of probabilities on 
corresponding ensembles: Pg(A) = Pare), 

In further quantum considerations we shall often use the following con- 
sequence of Bayes’ formula. Let A, E€ F(as),k = 1,..., m, UR; An = S and 
Ak N A; = 0,4 Æ L. Then, for every C € F(a) such that C N A, € F(z), 


we have: 
m 


Ps(C) = >> Ps(Ax)Pa,(C). 


k=1 
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It is the well known formula of total probability. In standard textbooks this 
formula is written as 


P(C) = $ P(Ax)P(C/Ax). 
k=1 


Thus concrete ensembles which are used to define left and right hand sides 
probabilities are not taken into account. We shall see that in quantum for- 
malism this manipulation with the ensemble index will imply such unex- 
pected consequences as non-locality of space-time and super-luminal signals 
and death of reality. 

The direct generalization of proportional formula (2.2) for ensemble prob- 
abilities to infinite ensembles S is impossible in the framework of real analysis, 
because there are no actual infinities (infinitely large numbers) in the field 
of real numbers R. A measure-theoretical approach (see section 4) provides 
some indirect generalization. However, this measure-theoretical approach is 
not the unique possibility to extend the proportional definition of probabil- 
ity to infinite ensembles. In Chapter 4 we shall consider ensembles which 
have structures of trees with an infinite number of vertexes (with p branches 
leaving each vertex; there p > 1 is a prime number). For such ensembles we 
can directly use (2.2) to define ensemble probabilities (there N = |S| can 
be an infinite large number belonging to the field of so called p-adic num- 
bers). Other possibility for extending (2.2) to infinite ensembles S' is to use 
nonstandard analysis (see [3]). 


3 Frequency theory of probability 


This theory was the first where the principle of the stabilization of statisti- 
cal frequencies was realized on a mathematical level. In fact, this principle 
was used as the definition of probability. Let us recall the main notions of a 
frequency theory of probability [86|—[88] of Richard von Mises (1919)! This 
theory is based on the notion of a collective. Consider a random experiment 
S and denote by L = {s1,...,5m} the set of all possible results of this ex- 
periment. The set S is said to be the label set, or the set of attributes. We 
consider only finite sets L. Let us consider N realizations of S and write a 


'In fact, already in 1866 John Venn, see [105], tried to define a probability explicitly 
in terms of relative frequencies. 
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result zj after each realization. Then we obtain the finite sample: 
T = (£1, EN), Tj E€ L. (3.1) 
A collective is an infinite idealization of this finite sample : 
£ = (21,...,0N,..), Zj E L, (3.2) 


for which the following two von Mises’ principles are valid. 

The first is the statistical stabilization of relative frequencies of each at- 
tribute a € S in the sequence (3.2). Let us compute frequencies 1y (a; £) = 
ny(œ;x)/N where ny(a;z) is the number of realizations of the attribute a 
in the first N tests. The principle of the statistical stabilization of relative 
frequencies says : the frequency vy(a;x) approaches a limit as N ap- 
proaches infinity for every label a € L. This limit P(a) = lim vy (a; æ) is 
said to be the probability of the label a in the frequency theory of probability. 
Sometimes this probability will be denoted by P,(a@) (to show a dependence 
on the collective z). 

“We will say that a collective is a mass phenomenon or a repetitive event, 
or simply a long sequence of observations for which there are sufficient reasons 
to believe that the relative frequency of the observed attribute would tend to 
a fixed limit if the observations were infinitely continued. This limit will be 
called the probability of the attribute considered within the given collective” 
[87]. 

The second principle is the so-called principle of randomness. Heuris- 
tically it is evident that we cannot consider, for example, the sequence 
z = (0,1,0, 1, ...,0, 1, ...) as a random object (generated by a statistical exper- 
iment). However, the principle of the statistical stabilization holds for z and 
P(0) = P(1) = 1/2. Thus, we need an additional restriction for sequences 
(3.2). This condition was proposed by von Mises: 

The limits of relative frequencies have to be stable with respect 
to a place selection (a choice of a subsequence) in (3.2). 

In particular, z does not satisfy this principle. For example, if we choose 
only even places, then we obtain the zero sequence 4 = (0,0,...) where 
P(0) =1,P(1) =0. 

However, this very natural notion was the hidden bomb in the foundations 
of von Mises’ theory. The main problem was to define a class of place selec- 
tions which induces a fruitful theory. The main and very natural restriction 
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is that a place selection in (3.2) cannot be based on the use of attributes of el- 
ements. For example, we cannot consider a subsequence of (3.2) constructed 
by choosing elements with the fixed label a, € L. Von Mises proposed the 
following definition of a place selection: 

(PS) “a subsequence has been derived by a place selection if the decision 
to retain or reject the nth element of the original sequence depends on the 
number n and on label values 2, ...,%n—1 of the (n — 1) presiding elements, 
and not on the label value of the nth element or any following element” , 

see [87], p.9. Thus a place selection can be defined by a set of func- 
tions fi, fo(x1), fs(@1, £2), fa(21, T2, £3), ..., each function yielding the values 
0 (rejecting the nth element) or 1 (retaining the nth element). 

Here are some examples of place selections: (1) choose those z, for which 
n is prime; (2) choose those x, which follow the word 01; (3) toss a (different) 
coin; choose Tn if the nth toss yields heads. The first two selection procedures 
may be called lawlike, the third random. It is more or less obvious that 
all of these procedures are place selections: the value of z, is not used in 
determining whether to choose Zp. 

The principle of randomness ensures that no strategy using a place se- 
lection rule can select a subsequence that allows different odds for gambling 
than a sequence that is selected by flipping a fair coin. This principle can be 
called the law of excluded gambling strategy. 

The definition (PS) induced some mathematical problems. If a class 
of place selections is too extended then the notion of the collective is too 
restricted (in fact, there are no sequences where probabilities are invariant 
with respect to all place selections). This was the main point of criticism of 
von Mises’ theory. This problem has been investigated since the 1930s and 
solved only in the 1970s on the basis of Kolmogorov’s notion of algorithmic 
complexity [76]. 

However, von Mises himself was satisfied by the following operational so- 
lution of this problem. He proposed [88] to fix for any collective a class of 
place selections which depends on the physical problem described by this col- 
lective. Thus he removed this problem outside the mathematical framework. 

The frequency theory of probability is not, in fact, the calculus of proba- 
bilities, but it is the calculus of collectives which generates the corresponding 
calculus of probabilities. We briefly discuss some of the basic operations for 
collectives (see [88] for the details). 

As probability is defined on the basis of the principle of the statistical 
stabilization of relative frequencies, it is possible to develop quite fruitful 
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probabilistic calculus by using only this principle. Sequence (3.2) which 
satisfies the principle of the statistical stabilization of relative frequencies is 
said to be a S-sequence. Thus limits of relative frequencies in a S-sequence 
x need not be invariant with respect to some class of place selections ?. 

(a) Mixing and additivity. Let x be a collective with the (finite) label 
space Ly and let E = {aj,, ...,a;,} be a subset of Ly. The sequence (3.2) of x 
is transformed into a new sequence yg by the following rule. If x; € E then 
we write 1; if x; ¢ E then we write 0. Thus the label set L,, = {0,1}. It is 
easy to show that this sequence has the property of statistical stabilization 
for its labels. For example, 


l l 
P,,(1) = limvy (E; x) = lim X` vy (aip; z) = XO Palon), (3.3) 
k=1 k=1 


where vy(E; £) = vy(1;yg) = ny(1l;yg)/N is the relative frequency of 1 in 
yeg. To obtain (3.3) we have only used the fact that the addition is a continu- 
ous operation on the field of real numbers R. We can show that the sequence 
yg also satisfies the principle of randomness, see [88]. Hence this is a new col- 
lective. By this operation any collective z generates a probability distribution 
on the algebra Fy, of all subsets of L, : P(E) = P,,,(1). Sometimes it will 
be convenient also to denote this probability distribution by P,(/) to dis- 
tinguish probabilities corresponding to different collectives. Now we find the 
properties of this probability. As P(E) = lim vy (E; x) and 0 < vy(F) < 1, 
then (by the elementary theorem of real analysis) 0 < P(E) < 1. Hence the 
probability must yield values in the segment {0, 1]. Further, as the collective 
YL, Corresponding to the whole label set Lẹ does not contain zeros, we obtain 
that vy(Lz;2) = vn(1;yr,) = 1 and, consequently, P(Z,) = 1. Finally by 
(3.3) we find that the set function P : Fz, — [0,1] is additive. Thus P is 
a normalized measure on the algebra Fz, which yields values in [0,1]. We 
remark that all these considerations can be repeated for S-sequences. 

(b) Partition and conditional probabilities. Let x be a collective and let 
A € Fr, and P(A) # 0. We derive a new sequence z(A) by retaining only 
those elements of x which belong to A and discarding all other elements 


2Of course, the use of S-sequences contradicts to the philosophy of the modern proba- 
bility theory which is based on generalizations of Mises’ principle of randomness (such as 
Kolmogorov complexity [76] and Martin-Lof {83] theory of statistical tests). However, it 
seems that all this machinery of randomness is not used in quantum physics. Experimen- 
talists are only interested in the statistical stabilization of relative frequencies. 
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(thus the label set Laa) = A). This operation is obviously not a place se- 
lection, since the decision to retain or reject an element of z depends on 
the label of just this element. The sequence z(A) is again a collective, see 
[88]. Suppose that a; € A and let y4 be the collective generated by x with 
the aid of the mixing operation. Then P,,4)(a;) = limno Vw (aj; 2(A)) = 
limpsoo Vn, (@;; 2(A)), where Nk — 00 is an arbitrary sequence. As P(A) 4 0 
then Mpk = n,(1; ya) — œœ (this is the number of labels belonging to A among 
the first k elements of x). Thus 


Pza (aj) = lim vay, (aj; 2(A)) = lim ray, (055 z(A))/M; 


= lim [nm, (05; 2(4))/k] : [Mu/K] = Pa(a;)/Pa(4). 


We have used the property that ny, (a,;;z(A)), the number of a; among first 
M, elements of z(A), is equal to n,(a;;z), the number of a; among first k 
elements of x. The probability P,,4)(a;) is the conditional probability of the 
label a; if we know that a label belongs to A. It is denoted by P(a;/A) = 
P,(a;/A). As a consequence of this formula we obtain Bayes’ formula: 


Pxa(B)= J, Pz(a;/A) 


ajEBNA 


= X. P,(a;)/P2(A) = P2(BN A)/P;(A). (3.4) 


ajEBNA 


In fact, this formula connects probabilities defined with respect to different 
collectives. The left hand side probability is P,,4) and the right hand side 
probabilities are P}. As in the case of the ensemble probability, sometimes 
we shall use the symbol P4(B) instead of P(B/A). It useful to remark that 
P4 : Fr, — [0,1] is a measure normalized by 1. In particular, the probability 
P may be written as the conditional probability Pz,. 

As in the ensemble framework, here we can also obtain the formula of 
total probability (2.8). Formula (2.8) is often applied in the wrong way: 
probabilities P(A;,) are found with respect to one collective and conditional 
probabilities P(C/Ax) with respect to other collective. To apply this formula 
in the right way we have to use the index of a collective: 


P, (0) = YS Pa(Ai)Pe(C/As) (35) 
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Formulas (2.5)—(2.7) can be also easily obtained in the frequency frame- 
work. 

Remark 3.1. The Bayes formula in the frequency framework is a consequence 
of the possibility of using the operation of partition for collectives. It should be 
noticed that from the physical point of view the operation of partition is a physical 
condition, which means that by extracting the collective z(A) from the original 
collective z we do not change the property of belonging to B or not. If the physical 
system does not satisfy this condition, we cannot use the Bayes formula (3.4). This 
does not mean that we cannot define the conditional probability P4(B). But we 
cannot use (3.4) to find this probability. 

It is important to remark that the conditional probabilities in (2.7) are 
defined with respect to different collectives, z(A) and z( B). From the physical 
point of view the connection (2.7) between these probabilities is possible only 
for physical systems which satisfy conditions discussed in Remark 3.1. 

It is evident that we can also consider countable sets of attributes L, = 
{a1, @2, ...., Qm, ...}. If we use the additional condition Yat P(a;) < œ for 
the probabilities of labels then P is a (discrete) measure on Fy. Moreover, 
this measure is o-additive. However, the generalization of the frequency 
theory of probability to ‘continuous’ sets of attributes is a nontrivial mathe- 
matical problem, see [88], [102]. 


4 Kolmogorov’s measure-theoretical theory 


The axiomatics of the modern probability theory was proposed by Andrei 
Nikolaevich Kolmogorov [74] in 1933 to provide a reasonable mathematical 
description of this theory. The basis of Kolmogorov axiomatics was prepared 
at the beginning of this century in France by investigations of Borel [15]- 
[16] and Frechet [40] on the measure-theoretic approach to probability. At 
the same time Kolmogorov used ideas of von Mises [86] about the frequency 
definition of probability (see remarks in [74]). 

By the Kolmogorov axiomatics the probability space is defined as the 
triple P = (Q,F,P), where Q is an arbitrary set (points w of Q are said to 
be elementary events), F is an arbitrary o-algebra of subsets of Q (elements 
of F are said to be events), P is a o-additive measure on F which yields 
values in the segment [0,1] of the real line and normalized by the condition 
P(Q)=1. 

Random variables on P are defined as measurable functions € : (Q, F) > 
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(R,B), where B is the Borel c-algebra on the real lin. We shall use the 
symbol RV (P) to denote the space of random variables over P. Probability 
distribution of € € RV (P) is defined as P(B) = P(€~1(B)) for B € B. This 
is a o-additive measure on the Borel o-algebra. 

A.N. Kolmogorov motivated additivity of probability by additivity of fre- 
quency probability (see formula (3.3)); he also used frequency reasons to take 
the segment [0,1] as the range of values of a probabilistic measure. On the 
other hand, the condition of o-additivity was considered by Kolmogorov as 
an additional mathematical (technical) condition to provide a fruitful inte- 
gration theory based on the Lebesgue integral. In fact, Kolmogorov started 
with finite additive probabilities defined on algebras of sets. The spaces 
with o-additive probabilities defined on o-algebras were called generalized 
probability space. 

The Kolmogorov theory also contains the additional axiomatic definition 
of conditional probabilities. By definition P(B/A) is defined by formula 
(2.4). Kolmogorov did not give any motivation for this definition in his book 
[74]. However, as he gave a clear motivation of all other properties of P on 
the basis of the von Mises frequency theory, it seems to be that he used the 
same frequency reasons for (2.4). In Kolmogorov’s model two events A and 
B are said to be independent if 


P(AN B) = P(A)P(B) (4.1) 


or 


P(B/A) = P(B), P(A) > 0. (4.2) 


In the standard framework of Lebesgue integration we start with a o- 
additive measure u defined on some algebra F' and then p is extended over 
the o-algebra F generated by F (Borel o-algebra). This extension procedure, 
which is well defined from the mathematical point of view, is not so innocent 
from the probabilistic point of view. Kolmogorov remarked: “ Even if the sets 
(events) A of F can be interpreted as actual and (perhaps only approximately) 
observed events, it does not, of course, follow from this that the sets of F rea- 
sonably admit of such an interpretation. Thus there is the possibility that while 
a field of probability (F,P) may be regarded as the image (idealized, however) 
of actual random events, the extended field of probability (F, P) will still remain 
merely a mathematical structure. Thus sets of F are merely ideal events to which 


3Thus €—!(B) € F for every B € B. 
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nothing corresponds in the outside world. However, if reasoning which utilizes 
probabilities of such ideal events leads us to a determination of the probability of 
an actual event of F, then, from an empirical point of view also, this determination 
will automatically fail to be contradictory”, see [74], p.17. It should be noticed 
that the adherents of Kolmogorov’s measure-theoretical approach to proba- 
bility theory did not pay large attention to these ideas of Kolmogorov. This 
implied that manipulations with abstract probabilities belonging to F were 
considered as real probabilistic investigations. Moreover, if we need not pay 
attention to the difference between real and abstract probabilities, we could 
in principle omit the concrete probabilistic model from our considerations 
and operate with ‘events’ belonging to abstract o-algebras. This is the main 
problem of worldwide use of Kolmogorov’s measure-theoretical approach. 


Remark 4.1. For example, Cramer, who used the Kolmogorov axiomatics 
to create the mathematical theory of statistics, had another point of view on 
the problem of verification: “any probability assigned to a specific event must, 
in principle, be liable to verification” [23]. The question of verification was the 
cornerstone of the von Mises theory for the continuous label set S. He showed that 
in the case Ls = R (or R”) a probability measure of an event E has the frequency 
interpretation iff the measure of the boundary of E is equal to 0, [88]. 


On the other hand, Kolmogorov himself developed actively the view- 
point that probability theory is a purely mathematical theory. Therefore the 
concrete structure of set algebra (or o-algebra) does not play any role in 
probabilistic considerations. In his manifest “General Measure Theory and 
Probability Calculus”, 1929 (see [99]), he wrote: “To outline the context of 
theory, it suffices to single out from probability theory those elements that bring 
out its intrinsic logical structure, but have nothing to do with the specific meaning 
of theory.” 


Finally we remark that in Kolmogorov’s approach Bayes’ formula (2.4) 
is just the definition of a conditional probability. I like to underline this 
fact. I have the experience that many scientists working in applications of 
probability are sure that Bayes’ formula is a theorem. But this is right only 
for ensemble and frequency approaches. On the other hand, the formula 
of total probability (2.8) is a theorem of the Kolmogorov’s theory. Here 
it holds true for a countable family of sets A, € F, P(A) > 0,k = 1,..., 
such that Ugo, A, = Q and A; N A; = 0,k Æ l: for every C € F, P(C) = 
Dra P(A) P(C/A,). To obtain this formula, we need to use the o-additivity 
of probability and the definition (Bayes’ formula) of conditional probabilities. 
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5 Kolmogorov’s ideas on probability 


It should be noticed that before to create the system of axioms of probability 
theory, A. N. Kolmogorov discussed ([99], 1929) some examples of ‘general- 
ized probabilities’ which could not be described by his axiomatics. Moreover, 
probably we need not call these objects ‘generalized probabilities’. It seems 
more natural to call ordinary probabilities (described by Kolmogorov’s ax- 
iomatics) ‘restricted probabilities’. 

There is other side of the common use of Kolmogorov’s approach which 
is not so visible as the disappearance of concrete probabilistic spaces. This 
is the idea that only Lebesgue measurable sets could play some role in prob- 
abilistic considerations. Of course, this is a consequence of the fact that Kol- 
mogorov discussed merely the Lebesgue extension [99] (or the Borel extension 
(74]). However, in principle some sets which are not Lebesgue measurable 
may appear in probabilistic models connected with some natural phenomena. 
We shall discuss such a model in Chapter 2. On the other hand, Kolmogorov 
discussed in [99] non-Lebesgue extension of the linear Lebesgue measure ps 
on the segment [0,1], namely, the result of Banach that 4 can be extended 
to a measure ji defined on the (o-)algebra Fjo} all subsets of [0, 1]. It seems 
to be that Kolmogorov considered this measure as a good candidate to be 
probability. He also considered multidimensional case and pointed out that 
an extension ji on Fjo j» of the Lebesgue measure p on [0, 1]” can be obtained 
by using the metric equivalence of a cube [0, 1]”,n = 2, 3, ..., and the segment 
[0,1]. Then he mentioned that in the case n > 2 such a measure does not 
satisfy the principle of equality of the measure of congruent sets. This is 
a consequence of example on the decomposition of a sphere into three sets 
being congruent to the sum of two others to within a countable set (see, for 
example, [48] for the proof): 

Theorem 5.1. A sphere S can be decomposed into disjoint sets S = 
AUBUCUQ such that: (i) the sets A,B,C are congruent to each other; (ii) 
the set BUC is congruent to each of the sets A, B,C; (iii) Q is countable. 

We continue to study the question on a domain of definition of probabil- 
ity. As we have seen, the ensemble approach does not imply automatically 
that the system of sets (events) F'(ms) (corresponding to properties ag of 
the ensemble S) must be an algebra. On the other hand, if Kolmogorov’s 
axiomatics is used, then we have to start with (at least) an algebra. However, 
there may be random phenomena which do not possess the structure of an 
algebra. Why the union AU B of two events A, B must always be an event? 
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Why the complement D = Q \ C of an event C must always be an event? 
It is interesting that, before to propose the general axiomatics of probability 
theory [74] (1933), Kolmogorov discussed the problem of a domain of defi- 
nition of probability [99] (1929). At that time he had the viewpoint which 
coincided with our viewpoint: “It is also doubtful if a measure connected with 
some problem in probability calculus need be closed” (i.e., defined on an alge- 
bra). In [99] Kolmogorov pointed out that “one should not assume, however, 
that the existence of measures of two intersecting sets implies the existence of 
measure for their sum or difference: there are certain important measures without 
this property.” In particular, he discussed the following example. 

Example 5.1. (Density of natural numbers; see, for example, [45], [91]. 
for the details). For a subset A C N the quantity 

_ |AN{1,...,V}| 
(A) = Jin Epe, 

is called the density of A if the limit exists. Let Ga denote the collection of 
all subsets of N which admit density. It is evident that each finite A Cc N 
belongs to Gy and 6(A) = 0. It is also evident that each subset B = N \ A, 
where A is finite, belongs to G4 and 6(B) = 1 (in particular, P(N) = 1). The 
reader can easily find examples of sets A € Ga such that 0 < 6(A) < 1. 

Proposition 5.1. Let A1, Ao € Ga and AyM Ap = Ø. Then A; U Az E€ Ga 


and 
P(A, U Ag) = P(41) + P(AQ). (5.1) 
Proof. As Ay N Ag = 0, then \(Ay U Ad) N{i, ee N}y| = |A N {1, e.. NY + 
|AN {1,..., N}. a 
Proposition 5.2. Let A,, Az € Gy. The following conditions are equiva- 
lent: 


1)A, U Ag € Ga; 2)A1M Ao € Ga; 
3)Ay \ Ae E Ga; 4) Ap x Ay E€ Gu. 
There are standard formulas: 
P(A, \ Az) = P(Ai) — P(A N Ap). (5.3) 
Proof. We have 


|(A,UAg)N{1, ..., N}| = [Ain {1, ..., NHH Ao {], ..., N}|—|(ArN Ag) {1 -.., NY. 
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Therefore, if, for example, A; N Ag € Ga then there exists a limit of the right 
hand side. It implies A; U Ag € Gy and (5.2) holds. Other implications are 
proved in the same way. E 

It is possible to find sets A, B € G4 such that, for example, A N B ¢ Ga. 
Let A be the set of even numbers. Take any subset C C A which has no 
density. In fact, you can find C such that 


1 
Z 1.2,- N 
SOM {1,2)--- NH 
is oscillating. There happen two cases: C N {2n} = {2n} or = 0. Set 
B=CU {2n—1:CN {2n} =Q} 


Then, both A and B have densities one half. But AN B = C has no density. 
Thus Gy is not a set algebra. 

In 1929 A.N. Kolmogorov wrote [99]: “It is not known whether every mea- 
sure is closable. If closure is possible, then it is not necessarily closable in only one 
way. It would seem that it is very difficult to find a measure that closes the mea- 
sure given by the density of natural numbers.” We can prove that the density 
of natural numbers can be closed (extended on the algebra Fy of all subsets 
of N), see Theorem 5.4. 

To formalize our considerations on the density of natural numbers, we 
propose the following definition. 

Definition 5.1. A system of subsets G of a set Q, which has the properties 
described by Proposition 5.2 and contains Ù and Q, is called a set semi- 
algebra. 

Definition 5.2. A function P : G — [0,1], where G is a semt-algebra, 
is said to be a probability semi-measure if it satisfy the additivity condition 
(5.1) and P(Q) = 1. 

Definition 5.3. The system P = (0,G,P), where P is a probability 
semi-measure on a semi-algebra G, is called a semi-probability space. 

Unfortunately we could not say anything more about such a generalization 
of a probability space, because the theory of integration with respect to 
probability semi-measures is not well developed. 

We present the simplest construction of an extension of a measure u on 
the algebra of all subsets. This construction is based on a representation 
of u by a continuous linear functional on some space of functions and the 
application of the Hahn-Banach theorem. 
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Theorem 5.2. Let p be a (finite additive) measure on an algebra F 
of subsets of Q. Then there exists a finite-additive extension ft of p on the 
algebra Fo of all subsets of Q. 

Proof. We introduce the space of bounded functions 


B(Q) = {FR Ifl = sup|{(4)| < 00}. 


This is a normed space. We set 


L(F) ={f => cela, @ ER, Ag E€ FUR Ae = 9}, 
k=1 


where I, is the characteristic function of the set A. This is the normed 
subspace of the space B(Q). We define a linear functional |, : C(F) — R by 


NE frou Pore for f = re 


k=1 k=1 


This functional is well defined (i.e., J,,(f) does not depend on a representation 


of f). If 
nı n2 
J =J hla and f = $ olg 
k=1 k=1 


then it is always possible to find sets B, € F (l = 1, ..., N), BiN Bm = 0,1 # 
m, Q = UN,B), such that all sets Aj, j = 1,2, are represented as unions of 
sets Bı (of course, here we use the structure of an algebra). We write 


N 
f= dlp, where B € F, BiN Bm =0,1 Am, Q=UN,B,. (5.4) 
l=1 


By using finite-additivity of u we obtain that 


nı N n2 
X ch MAR) =Y d w(By) =X c aA). 
k=1 l=1 k=1 


This functional is continuous (bounded) on the normed space L(F), because 
on the basis of representation (5.4) we obtain: 


N N 
Fy =1 d MBO map D HCB) = w(O) sup if) = ADI 
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In fact, the norm |{J,,|| = sup{|lu(f)| : [fllo <1, f E L(F)} of l, is equal to 
p(Q). 

We apply the following well known theorem of functional analysis. 

Theorem 5.3. (Hahn-Banach) Let E be a normed linear space and let 
U be its linear subspace. Every continuous linear functional l: U — R can 
be extended to a continuous linear functional L : E — R. in such a way that 
norms of the functionals | and L coincide: ||L,|| = ||lull. 

An extension on the algebra Fp of the measure p is defined by f(A) = 
L,(A), A € Fo, where L, : B(Q) — R is an extension of the continuous linear 
functional l, : £(F') — R given by the Hahn-Banach theorem. Linearity of 
L, implies that the ñ is finite-additive. 

Finally we have to prove that (A) > 0 for any A € Fo. Suppose that 


there exists A € Fo such that c = A(A) < 0. Then f(A) = p(Q) — c > (9). 
On the other hand, f(A) = Laa) < |Lallool|LZull = Illl = u(O). This 
contradiction implies that f is non-negative. a 
On the other hand, the ñ may be not o-additive even if u is o-additive. 
It seems that an answer to the question: 
“Is it possible in the general case to construct a o-additive extension ji 


on the algebra Fh of a c-additive measure p?” 


is unknown. 

Another difficulty is that the proof of the Hahn-Banach theorem is based 
on the axiom of choice. Therefore we also have to use this axiom to obtain 
an extension of probability. However, the place of the axiom of choice in 
quantum physics is not clear. Thus it is not easy to find the range of possible 
applications of probabilities extended on Fo with the aid of the Hahn-Banach 
theorem. 

It seems that in general case it is impossible to obtain the existence of an 
extension ji of u without the axiom of choice. 

However, the main problem is non-uniqueness of an extension fi. By our 
construction fi is determined by an extension L, of the functional l,. In 
general such an extension is not unique. 

Corollary 5.1. Let P be a probabilistic measure on an algebra F of 
subsets of Q. Then there exists a finite-additive extension P of P on the 
algebra Fa of all subsets of Q. 

Proof. By the Hahn-Banach theorem 1 = P(Q) = ||J,|| = ||Z,||. As, for 
each A € Fo, ||I4|loo = 1, we obtain that P(A) = Ly (14) < ||L,llI|Lalloo < 1. 
Thus P a (finite-additive) probabilistic measure. m 
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In some physical models we may use ‘probabilities’ defined on the algebra 
Fh of all subsets of Q which are obtained via the Hahn-Banach theorem. As 
it has been noticed, in general these probabilities are not o-additive. How- 
ever, finite-additivity is merely a mathematical problem. The real problem 
is non-uniqueness of an extension P. For instance, we start with a o-additive 
probability P defined on a o-algebra F. Let us assume that some events 
A E Fo\F have a physical meaning. Let P, and P, be different extensions 
of P to the algebra Fo. In principle, P;(A) 4 P2(A). As mathematical argu- 
ments are not sufficient to fix a ‘probability’, we need to use some additional 
physical arguments to obtain the ‘right extension’. 


It seems that the situation with nonuniqueness is even more complicated. As 
in the above considerations, let us start with a o-additive probability P defined 
on a o-algebra F. Let Py be the Lebesgue extension of P on the o-algebra Fr 
of Lebesgue measurable sets. The Py, is the unique o-additive extension of P on 
Fy. On the other hand, there may exist finite-additive extensions P of P on Fr 
which do not coincide with Py. As we have discussed many times, the condition 
of o-additivity is a purely mathematical condition. Therefore from the physical 
viewpoint there are no reasons to choose only the o-additive extension. Thus 
the standard choice, P,(A), of probability for events A € Fr, does not seem so 
natural from the physical viewpoint. We think that some paradoxes in quantum 
formalism are a consequence of the common opinion that only the Lebesgue ex- 
tension Pz, gives ‘right physical probability’. In particular, the proof of famous 
Bell’s inequality is based on such an assumption. Thus the Eistein-Podolsky-Rosen 
paradox (see Chapter 2) might be a consequence of the conventional (but probably 
non-physical) choice of an extension of probability. 

We have discussed norm-preserving extensions of probability obtained via the 
Hahn-Banach theorem. In principle there may exist extensions L, : B(Q) > R of 
the linear functional l, : £ + R which increase the norm: ||L,|| > ||,||. If we define 
an extension of a measure u : F — R, with the aid of such an extension, B(A) = 
La(Ia), A E Fo, then we could not be sure that j is non-negative. In this way 
starting with probability P : F — [0,1] we may obtain generalized probabilities 
P : Fo — R with negative values as well as with values which are larger than 1. 
We shall see in Chapter 3 that such generalized probabilities may have physical 
meaning. We note that if P : F — [0,1] is a o-additive probability, then it may 
be that a (norm-increasing) extension P : Fo — R. is also o-additive. In such a 
case we obtain a signed probability measure (a charge), see Chapter 3. 


Moreover, it may exist a norm-increasing extension P of g-additive probability 
P (defined on a o-algebra F) on the Lebesgue o-algebra Fr. It may be that 
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P(A) < 0 (and P(A) > 1) even for some A € Fr. Thus even for events A € 
Fr, which are typically considered as ‘physical events’, we could obtain negative 
generalized probabilities. Such negative probabilities have natural ensemble and 
frequency interpretations (see Chapter 3). There must be some special physical 
reasons to consider only the norm-preserving extension P;, of P on the Lebesgue 
o-algebra Fr. If there are no such reasons, then in principle we can use signed 
probabilities P : Fr, — R for the description of a physical model. It seems (see 
Chapter 3) that there are physical reasons to use such signed probabilities (instead 
of the standard probability Pz) in some physical models. In particular it may be 
that P(A) = P,(B) = A, but P(A) # P(B). In such a case we can split the 
conventional probability À into a set of generalized probabilities. In particular, 
zero conventional probability can be split into a set of generalized probabilities 
(see Chapters 3 and 4 for the details). 

We turn back to the density of natural numbers. 

Theorem 5.4. The density of natural numbers 6 : Ga — [0,1] can be 
extended to a finite-additive measure 6 : Fy — [0,1]. 

Proof. We apply again the Hahn-Banach theorem. However, as Gj is not 
an algebra, we could not directly apply the scheme of the proof of Theorem 
5.1. Denote by LIM(N) the subspace of the normed space B(N) (of all 
bounded functions f : N — R) consisting of all functions f for which there 
exists the mean value: 


lo(F) = lim =(f(1) + -+ f(n) - 


The ls : LIM(N) — R is a continuous linear functional and Is(I4) = 6(A) for 
each A € Ga. By the Hahn-Banach theorem Is can be extended to a continuous 
linear functional Ls : B(N) — R and 1 = 4(N) = |[ls|| = ||Za||. We set 
6(A) = L,(I,) for A € Fy. The linearity of Ls implies that 6: Fy > R 
is additive. By the same reasons as in Theorem 5.2 we obtain that 6 is 
non-negative. a 
If A € Fn \ Ga, then 
5(A) £ lim |AN {1, 2, ..., n} 


NCO 


Thus the frequency verification of the event A is impossible (the principle of 
the statistical stabilization is violated; compare with Chapters 2 and 4). 

An extension of 6 from semi-algebra Gy on the o-algebra Fy given by Theorem 
5.3 is not unique. If in some physical model some sets A € Fn \ Ga are considered 
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as physical events, then there must be special physical reasons to choose one or an- 
other extension of 6. And we have to remember that the principle of the statistical 
stabilization is violated for events A € Fy \ Ga. As in the case of measures defined 
on algebras, there may exist extensions Ls of ls which do not preserve the norm: 
||L5|| > 1. In such a case an extension ô corresponding to Lẹ can take negative 
values. 

In principle we might consider Theorem 5.4 as the answer to the question 
of A. N. Kolmogorov on a possibility to close the density of natural numbers 
ô. However, Kolmogorov wanted to find a measure which closes ô. Theorem 
5.4 is not constructive (it is based on the axiom of choice). Really it does 
not give the answer to Kolmogorov’s question. The construction which has 
been used in Theorem 5.4 could not be applied to an arbitrary semi-measure 
u. Thus we do not know the answer to the question: “Is it possible to close 
an arbitrary semi-measure?” (compare with Kolmogorov, 1929). 


6 Measure-theoretical approach and interpre- 
tations of probability 


Now we are going to discuss possible probability interpretations of the Kol- 
mogorov measure-theoretical approach (the mathematical theory of a special 
class of measures). As we have seen, the probabilistic measures can be associ- 
ated with all probability models (classical, ensemble, and frequency). There- 
fore it is in principle possible to use the measure-theoretical formalism and 
classical, ensemble or frequency interpretation. However, A. N. Kolmogorov 
proposed not only a mathematical formalism but also an interpretation of 
this formalism. We shall start with this interpretation. 

1. Ensemble-frequency interpretation. Kolmogorov interpreted a 
probability in the following way: “... we may assume that to an event A 
which may or may not occur under conditions X is assigned a real number 
P(A) which has the following characteristics: (a) one can be practically cer- 
tain that if the complex of conditions © is repeated a large number of times, 
N, then if n be the number of occurrences of event A, the ratio n/N will 
differ very slightly from P(A); (b) if P(A) is very small, one can be practi- 
cally certain that when conditions © are realized only once the event A would 
not occur at all”. This interpretation is a mixture of the frequency and en- 
semble interpretations. In fact, (a) is the frequency interpretation and (b) 
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is the ensemble interpretation. However, we cannot identify Kolmogorov’s 
interpretation with any of these interpretations (for example, we may not 
assume (see [88], p.5) that each infinite repetition of © will generate a collec- 
tive). This mixture of interpretations generated some problems and played a 
negative role in applications of probability theory. Kolmogorov did not sepa- 
rate the proportion (measure) in ensemble and the frequency of realizations. 
Moreover, it seems to be that he often reduced the proportion in an ensemble 
to the proportion (2.1) for possible cases*. For example, he considered the 
experiment of tossing a coin twice and obtained a finite space of elementary 
events 2 = {HH, HT,TH,TT}, where the labels H,T are used for the sides 
of a coin. I think that Kolmogorov understood very well the weakness of 
his interpretation. For this reason he considered this problem again 30 years 
later and proposed the theory of algorithmic complexity of random sequences 
[76]. However, the latter theory is nothing other than the attempt to justify 
the frequency probability theory of von Mises. 

Remark 6.1. As the ensemble-frequency interpretation is based on both 
frequency and proportional arguments, the range of applications of Bayes’ for- 
mula (2.4) is restricted by Remarks 2.1 and 3.1. In fact the Bayes formula is the 
additional postulate of the Kolmogorov axiomatics. In principle we can use the 
Kolmogorov theory (probability spaces) without Bayes’ formula (2.4). This the- 
ory will describe the physical systems with a violation of (2.4). This framework 
was developed by Accardi [1]; we shall discuss it in the connection with quantum 
theory. 

As we have seen, there are two (essentially different) contributions of Kol- 
mogorov to probability theory. The first is the measure-theoretical approach 
and the second is the ensemble-frequency interpretation. The first is purely 
mathematical and the second is phemenological. Of course, it is possible to 
combine Kolmogorov’s measure-theoretical formalism with other interpreta- 
tions of probability. However, we have to pay attention to the problem that 
the use of a specific interpretation induces some restrictions to Kolmogorov’s 
measure-theoretical approach. We start with the ensemble probability. 

2. Measure-theoretical approach and the ensemble interpre- 
tation. Let S be an arbitrary (probably infinite) ensemble. Let Ps = 
(S,F, Ps) be Kolmogorov’s probability space based on S. This space can be 
used for probabilistic analysis on S. However, we have to remember that the 
set Ts of properties can differ from the set of random variables RV (Ps). There 


4There Kolmogorov followed the historical tradition. 
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can exist random variables € € RV(Ps) (and, in particular, sets A € F) 
which are not properties of s € S. As it was mentioned by Kolmogorov, anal- 
ysis based on € € RV (Ps) can give some results for elements 7 € mg which 
have no real physical meaning. On the other hand, there can exist properties 
n € mg which are not random variables (these are non-measurable maps 7 on 
Ps). Other important thing is that all probability distributions depend on 
the ensemble S. 


Let us consider the following example. Let P; = (Qj, Fj, P), j = 1,2, be 
Kolmogorov’s probability spaces and let € : Q; — R be random variables 
with probability distributions P;,. Then in Kolmogorov’s formalism it is 
always possible to construct a probability space P = (Q,F,P) such that 
there are well defined random variables £; € RV(P),j = 1,2, such that 
Pz, = P;,. We can simply set Q = Qi x Q2, F = Fi 8 Fo, P = Pi 8 Po 
and €;(w1,w2) = &(w;). However, it does not sound reasonable that we can 
do the same thing in the ensemble framework. Let Q) = Q, = S and &; € 
Tts, j = 1,2. In general it does not sensible to use the ensemble 2 = S x S 
for representing properties of the original ensemble S. 


3. Measure-theoretical approach and frequency probability. The 
original viewpoint of R. von Mises was that Kolmogorov’s probability mea- 
sure is nothing other than the probability distribution P, (on the label set 
L,,) of a collective z. The Kolmogorov probability space in Mises’ theory is 
chosen as P, = (Ls, FLa, Pa) where in general case Fz, is some o-algebra of 
subsets of L}. As we have already pointed out, in the continuous case not 
all sets A € Fz, have the frequency meaning. In particular, if the measure- 
theoretical approach is used for the description of the frequency phenomena, 
then the possibility of the frequency verification for events A € Fz, must be 
controlled. However, it is even more important to control continuously the 
dependence of a probability space on a collective. 


Let us consider the following example (which is similar to the exam- 
ple considered in the ensemble framework). Let 2,7 = 1,2, be two col 
lectives with the label sets L; and probability distributions P,;. Let P; = 
(Ly, F Lj Pri ),j = 1,2, be the corresponding Kolmogorov’s probability spaces. 
Let A; € F,,. Suppose that somewhere we need to use conditional probabil- 
ity P(A,/A 2). What is the meaning of Bayes’ formula (2.4) in this case? 

4. Measure-theoretical approach and ensemble-frequency inter- 
pretation. As we have already mentioned, typically Kolmogorov’s measure- 
theoretical formalism on abstract probability spaces is used together with the 
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ensemble-frequency interpretation of probability. However, as in the cases of 
the ensemble and frequency theories, we must be careful with applications 
of the abstract measure-theoretical formalism. We study the question of a 
choice of a probability space for the concrete probability experiment. 


The part (a) of the ensemble-frequency interpretation of probability im- 
plies that the space Q must describe occurrences of events in very long se- 
quences of repetitions of some condition © (in the mathematical formalism 
sequences can have infinite length). It seems that collectives can be used 
for the description of such a phenomenon. However, the part (b) is related 
to occurrences of events under a single realization of conditions X. Proba- 
bility of a single realization is nonsense for collectives. Let us try to solve 
the contradiction between probability in a long sequence of repetitions of © 
and a single realization of £. We may consider the space C of all possible 
collectives which can be induced by repetitions of X. Then we may introduce 
on C a probability measure P (which seems to have the meaning of an en- 
semble probability for the ensemble C’) that would provide a mathematical 
description of the part (b). The latter would mean that if P(A) is very small, 
then a single realization of A (in one of collectives x € C) is practically im- 
possible (from the ensemble viewpoint). However, in the standard formalism 
the space C of all collectives is not used as a space of elementary events 0.5 
Instead of C', there is used the space Q = L® of all infinite sequences of labels 
a € L. Such a choice gives measure-theoretical advantages. However, this 
implies the consideration of sequences which have no probabilistic meaning. 


We construct now Kolmogorov’s probability measure Pk on the space 
of sequences Q which gives (as it is commonly accepted) the mathematical 
realization for (a) and (b). We start with the consideration of a symmetric 
coin (with sides denoted by symbols 0 and 1), L = {0,1}. Here we can use the 
classical definition of probabilities as the starting point for the construction 
of PK, As there are two equally possible cases, the classical probabilities 
P*(0) = P“(1) = 1/2. Now consider m trials of the coin and write all possible 
samples (3.2). At this point it seems that the formalism is developed in the 
same way as in the von Mises theory. However, the next step demonstrates 
the crucial difference between two the approaches. Denote by 5, = L™ the 
set of all vectors of length m with coordinates 0,1. This set is considered as 


5The constructive probability theory (see, for example, [79]) can be considered as an 
attempt to realize on the mathematical level of rigorousness the idea to use C as a space 
of elementary events. 
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a statistical ensemble. Thus, for i = (%,...,4m) E Sm, P®*(i) = Ps,,(i) = 
1/|S,,| = 1/2™. Bernoulli proved the following mathematical result for these 
ensemble probabilities: 

Theorem 6.1. (Bernoulli) The larger m is, the larger is the proportion 
of those vectors in Sm in which the relative number of zeros (or of ones) 
deviates from 1/2 by less than a given e. 

Obviously this is the result for proportional probability. But Bernoulli 
and most authors state this result as the result for the frequency probability: 
if one throws a ‘true’ coin long enough it is almost certain that the relative 
number of heads will deviate by less than € from 1/2. 

The Kolmogorov probability measure PX on the space of elementary 
events 

Q = {w = (wr, Wn) Ww; E L}, 


where L = {0,1}, will be defined with the aid of the ensemble probabilities 
Pg,,- For i = (t,...,4m) E Sm, a cylindrical subset of Q with the base i is 
defined as Bi = {w € Q : uy = 41, ...,Wm = im}. We set PX°'(B,) = Ps, (i) = 
1/2™. Denote the o-algebra generated by all cylindrical subsets by F (i.e., 
this is the minimal o-algebra which contains all cylindrical subsets of Q). 
PE”! is extended as a o-additive measure on the o-algebra F. 

It is typically assumed that the frequency part (a) of the interpretation 
can be described by following mathematical result for the measure PX, 

Theorem 6.2. (Law of large numbers) For any € > 0, 


PHW EQ : [Uj (1;w) — 1/2| > €}) > 0,m — œ, 


where Vm(l;w) = Nm(1;w)/m and nm(1;w) = Yj. 

However, like the classical Bernoulli theorem, the law of large numbers is 
not connected with the frequency approximation of probabilities. This is the 
statement on the approximation of classical probabilities P“'(0) = P@(1) = 
1/2 by ensemble probabilities. On the other hand, we could use the so called 
strong law of large numbers. 

Theorem 6.3. (Strong law of large numbers) There exists a subset Y € 
F such that PE) = 1 and vp(1,w) —> 1/2,m — œ, for all sequences 
wel, 

But on the basis of this statement we could not say anything about the 
statistical stabilization of v,,(1;w) for any concrete sequence w € Q. The 
strong law of large numbers do not say anything about a frequency approx- 
imation of ensemble probabilities; this is the statement about the frequency 
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approximation of the classical probabilities P“(0) = P“(1) = 1/2 in the 
sense of the ensemble probabilities. 

Conclusion. The laws of large numbers cannot be applied for describing 
the statistical stabilization of frequencies in sampling experiments. 

We construct now Kolmogorov’s measure P¥°! in the general case. The 
classical definition of probability cannot be used for nonsymmetrical coin. We 
use the frequency definition. The statistical experiments for coin’s tossing 
produce collectives x with the label sets L = {0,1}. Let us assume that 
all these collectives have the same probability distribution: g = P™(0) and 
qı = 1— qo = P®(1). For cylindric set By = {w € Q : w = i1, wk = iph, i = 
(i1, ikh t E L = {0,1}, the probability is defined as 


Pegel (6.1) 


where |i] = 2, +--+ + ik. 

In the symmetric case (qo = qı = 1/2) the origin of formula (6.1) has 
been explained in the ensemble framework. In the general case we could 
not apply the ensemble framework. Here we can apply frequency arguments. 
Let 2) = (2¥)2,,7 =1,2,...,k, be collectives having the same label space 
L = {0,1} and probability distribution P,.) (0) = qo, Pw (1) = q1. We form 
a new collective x = (a,)%, with the label space L* = L x --- x L by setting 
a, = (a eat = 1,2,... We assume that collectives x4) are independent 
(see sections 9,10 for the details). In particular, this imply the factorization 
of the probability distribution P, in a product of probability distributions 
P.o. Thus, for each i = (4, ..., ig), i = 0,1, there exists 


k k 
P.(i) = lim vulis) = | [ lim vmtin”) = [Pao (6.2 
l=1 


l=1 


where v(i; £) = nm(i;z)/M and vu(ips®) = ny(i;2)/M are relative 
frequencies for labels i € L* and į € L (in collectives x and x®, respectively). 
Formula (6.2) can be used as the motivation for definition (6.1) of probability 
of a cylindric subset B; of Q. 

The P**! defined on cylindric subsets by (6.2) can be extended to a prob- 
ability measure on the o-algebra F of Q generated by cylindric subsets. 

We analyse now how the PX°! serves to purposes (a) and (b). Here we 
can also use the strong law of large numbers: 
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Theorem 6.4. (Strong law of large numbers for nonsymmetrical distri- 
butions) There exists a subset Y € F such that P(N) = 1 and vy(a;w) = 
nu(a;w)/M > qa, M —> œ,& = 0,1, for all sequences w € YY. 

It seems that (with the same remarks as in the symmetric case) this is 
the mathematical realization of (a). The part (b) can be interpreted in the 
following way. If, for example, g << 1 then, for each j, P&"(w : wj =0)= 
do << 1. Thus ‘probability to obtain 0 in the jth test is practically zero.’ In 
fact, the problem is more complicated. In the nonsymmetrical case we could 
not interpret sequences w € Q (even some of them) as collectives generated 
by the statistical experiment®. The construction of Kolmogorov’s measure 
PX! demonstrates that w € Q have the meaning of (infinite) ‘multi-labels’ for 
the collective x = (a;)?2,, where x; = (z? ees E€ Q, which is obtained on the 
basis of a sequence {x4}, j = 1,2,..., of independent collectives having the 
same probability distribution qa, œ = 0,1 (i.e., a sequence (for j = 1,2,...) 
of parallel running sequences (for l = 1,2,...) of coins’ tossings). Thus the 
strong law of large numbers says that ‘practically all’ these ‘multi-labels’ have 
the property of the statistical stabilization and limits of relative frequencies 
(accidentally!) coincide with probabilities qa corresponding to collectives. 
Thus the probability measure PX”! describes the frequency approximation of 
probabilities only indirectly. 

The P” describes only random phenomena which have the property 
of ergodicity. The ergodicity has the following meaning. First we consider 
the statistical experiment in that one person makes a long run of coin’s 
tossings, u = (uj,..., um, -..), and obtains the relative frequencies vy (a; u) = 
nu(a;u)/M,a = 0,1. Then we consider another statistical experiment in 
that all persons belonging to a large statistical ensemble S (population) make 
simultaneously just one coin’s tossing. As the result of the latter experiment 
we obtain the proportions (in S), vg(@) = |S(a@)|/|S|, a = 0,1, of persons who 
have obtained the label a. Then v(a; u) & vg(a) for large M and |S|. Of 
course, we could not assume that all random phenomena have the property 
of ergodicity. Thus in general the ensemble and frequency interpretations of 
probability must be separated. 

Remark 6.2. Let collectives z?) be independent, but not in general equally 
distributed: P o) (a) = qaj, œ = 0,1. Then we obtain that P,(i) = Tih: 1 giu- This 


®Thus in the nonsymmetrical case the strong law of large numbers could not be in- 
terpreted in the same way as in the symmetric case. Only in the symmetric case we can 
interpret some of ‘elementary events’ w € Q as collectives generated by coin’s tossing. 
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can be used as the motivation to define a probability of a cylindric subset of Q 
by PKl(B;) = TÈ 1 9i,1- We underline again that there we could not use ensemble 
arguments to define probabilities of cylindric subsets. 

Conclusion. Kolmogorov’s ensemble-frequency interpretation can be used 
only for ergodic random phenomena. 


7 Subjective (Bayesian) probability theory 


According to the subjective interpretation of probability, it is the degree 
of belief in the occurrence of an event attributed by a given person at a 
given instant and with given set of information that is important. It is very 
important for our further quantum mechanical considerations that changing 
information changes probabilities. We illustrate this by an example. 

Example 7.1. I have forgotten something : Have I sent a letter to my 
friend or not? I can propose my subjective probabilities q, (the letter was 
sent), q (it was not sent), qı + q2 = 1,q; € [0,1]. Suppose that we have an 
ideal postal system, i.e., a letter could not disappear in the postal service. 
If I telephone to my friend and he tells me that he has received the letter, 
then at that moment the probabilities will immediately change: qı — 1 and 
q2 — 0, in the opposite case: qı — 0 and qz —> 1.7 

In fact the subjective theory of probability is a sufficiently good theory 
from the operational point view. The main problem of this approach is how 
to choose the subjective probabilities in a concrete case. In this theory it is 
postulated that the probability depends on the status of information which 
is available to whoever evaluates probability. Thus the evaluation of proba- 
bility is conditioned by some a priori (‘theoretical’) prejudices and by some 
facts (‘experimental data’). However, in applications all this information is 
nothing other than information about frequency or proportional probabili- 
ties. 

It must be noted that the subjective probability theory is described math- 
ematically by the Kolmogorov probability space (Q, F, P). The Bayes formula 
(2.4) is the cornerstone of this theory (therefore, it is also called Bayesian 
theory). As we have discussed, in principle we can exclude (2.4) from the 
Kolmogorov theory and consider a more general formalism which describes vi- 
olations of (2.4). Such an approach is impossible in the subjective framework. 


7In quantum formalism such a reduction of subjective probabilities is nothing other 
than so called collapse of a wave function (¢ = \/qi¢1 + ./@2¢2). 
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The subjective probability theory is applied in the following form. There is 
a fixed set of hypotheses (events) H; € F : U;H; = Q, H; NO H; = 9, i Æ j. 
Let E € F be an event. Suppose that we know conditional probabilities 
P(E/H;). Then we find P(H;/E) by (3.8) and the formula of total probabil- 
ity: P(£) = >>, P(E/H;)P(H;), i.e., 

P(H;,/E) = _P(E/H)P(H:) (7.1) 

do; P(E/H;)P(H;) 

This is the standard form of Bayes’ theorem. 

Remark 7.1. Of course, Bayes’ formula plays a great role in probability 
theory. However, as we have seen, there are restrictions for using this formula. 
These also are restrictions for using Bayesian probability theory. According to 
Bayesian theory P(E) = P(E/H) is a subjective probability (a measure of an 
individual belief) on the basis of the known set of conditions H; in particular, 
P(E) = P(E/Q) correspond to the set of all conditions. Therefore it is assumed 
that we can always extract the information H from the total amount of information 
Q. 

The main positive consequence of the subjective approach to probabil- 
ity theory is the connection between probability and information. The idea 
that probability is a measure of information on a random phenomenon (for 
example, a statistical ensemble) looks attractive. Typically such an informa- 
tion is reduced to our subjective knowledge about a random phenomenon. 
This information probability is coded by real number ginf/pr € [0, 1]. Intu- 
itively it is identifed with the classical probability (based on the proportion 
of equally possible cases) or with the ensemble probability. However, the re- 
lation between subjective probability and classical or ensemble probabilities 
is indirect. The subjective probability approach claims that @inf/pr is chosen 
on the basis of ‘subjective reasons’ of an individual. 

The subjective probability approach can be strongly improved if we assume 
that ‘subjective reasons’ are nothing other than the calculation of probability with 
respect to an ensemble of ideas S (in the brain of an individual) which are con- 
nected with the concrete random phenomenon. Thus ging /pr(@) = |S(@)|/|S|, where 
S(a) is the sub-ensemble of ideas which imply the property a. We shall study the 
connection between subjective (information) probability and probability on the 
space of ideas in Chapter 5 (in a p-adic information framework). 

On the other hand, it seems natural to generalize subjective probability ap- 
proach and construct information-probability theory in that (1) information which 
is coded in ginf/pr is not considered as subjective information (i.e., ging/pr is an el- 
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ement of information reality which is not less objective that material physical 
reality); (2) qinf/pr can be coded not only by real numbers belonging to [0,1], but 
also by some other information vectors. 


8 Foundations of Randomness 


We study some special questions of the frequency probability theory con- 
nected with the principle of randomness (see, for example, [79], [103], [6] for 
the details). 

1. Existence of collectives; Kamke’s objection. As we have already 
remarked, the principle of randomness based on the invariance of limits of 
relative frequencies with respect to the set of all possible place selections is 
too general. In fact, there are no sequences which satisfy this principle. To 
show this, we follow arguments of E. Kamke [49] (see also [79]). 

Let L = {0,1} and z = (2;)%2,, x; € L, be a collective which induces 
the probability distribution P(a) = 1/2, a = 0,1. Consider the set SI of 
all strictly increasing sequences of natural numbers. This set can be formed 
independently of z; but, among elements of SJ, we have the strictly increasing 
sequence {n : £n = 1}. This sequence define a place selection which selects 
the subsequence (11...1...) from z. Hence z is not a collective after all! 

The reader may well feel uncomfortable with the mathematical structure 
of the argument. Kamke claims to have shown that for every putative collec- 
tion z there exists a place selection ¢ that disturbs the statistical stabilization 
of frequencies to probability 1/2. The use of the existential quantifier here 
classical (Platonistic). Indeed, it seems impossible to exhibit explicitly a pro- 
cedure which satisfies von Mises’ criterion (independence on value z,) and at 
the same time selects the subsequence (11...1...) from x. The interesting 
analysis of this problem can be found in the review of M. van Lambalgen 
[103]. He is convinced that a satisfactory treatment of random sequences is 
possible only in set theories lacking the set power axiom, in which random 
sequences “are not already there.” However, even we uncritical accept classi- 
cal mathematics, Kamke’s argument is somewhat beside the mark in that it 
fails to appreciate the purpose of von Mises’ axiomatization. It refers to what 
could happen, whereas Mises’ axioms are rooted in experience and refer to 
what does happen. 

Remark 8.1. In various places von Mises likens the principle of randomness 
to the first law of thermodynamics. Both are statements of impossibility: the 
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principle of randomness is the principle of the excluded gambling strategy, while the 
first law (conservation of energy) is equivalent to the impossibility of a perpetuum 
mobile of the first kind. I think that such an analogy is not so natural. It would be 
more natural to connect the first law of thermodynamics with the first von Mises 
principle, the principle of the statistical stabilization of relative frequencies. The 
impossibility to perform precise measurements implies that the law of conservation 
of energy is only a statistical law. Thus it is just one of exhibitions of the principle 
of the statistical stabilization of relative frequencies. M. van Lambalgen compared 
the principle of randomness with the second law of thermodynamics, the law of 
increase of entropy or the impossibility of a perpetuum mobile of the second kind. 
Indeed, Kamke’s objection is reminiscent of Maxwell’s celebrating demon, that 
“very observant and neat-fingered being” , invented to show that entropy decreasing 
evolutions may occur. Maxwell’s argument of course in no way detracts from the 
validity of the second law, but serves to highlight the fact that statistical mechanics 
cannot provide an absolute foundation for entropy increase, since it does not talk 
about what actually happens (see [103] for further mathematical details). 

The early attempts to formalize Mises’ principle of randomness were based 
on considerations of different classes of lawlike place selection. The idea was 
to fix some class of lawlike place selections and then construct a set of collec- 
tives with respect to that class. Various authors (e.g. Popper, Reichenbach, 
Copeland) independently arrived at the so called Bernoulli selections. To dis- 
cuss this class of place selections, it is convenient to formalize the definition 
of place selection. 

Denote by L* the set of all finite words x = (%,...,2%m), 2; E€ L, 
m = 1,2,... in the alphabet (label set) L = {ao,ay,...,a},1 > 1; as 
usual the symbol L% is used to denote the set of all infinite sequences 
£ = (%1,...,2m,---), Zj E L. Set Tin = (£1,...,%,) for z E L (this 
is the initial segment of the length n of the sequence x). A place selection ¢ 
is defined on the basis of a function f : L* ++ {0,1}. The domain of definition 
of a place selection ¢ corresponding to f is the set 


dom ¢= {x € L” : Vn Ik > n: f(xy) = 1} C L”. 


For z € dom ¢, we set ọ(£) = (),, (tin), where the map ¢: L* + L* 
is defined as ¢(ua) = ¢(u)a if f(u) = 1 and ġ(ua) = ġ(u) if f(u) = 0 


(here u = (u1,... ,Um) and ua = (u1, ... , Um, @), uj,a € L). Thus a place 
selection ¢ is a partial function ġ : L® = L”. 
Example 8.1. (Bernoulli sequences) Let w = (wi,..., Ws), wj € L, be 


a fixed word. For a sequence z € L™, we choose all z, such that w is a 
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final segment of x;.,. The domain of this place selection, w, is the set of 
all sequences x E€ L® which contain infinitely many occurrences of the word 
w. Formally ¢, is defined on the basis of the function f,, : L* +> {0,1}, 
fw(u) = 1, if w is a final segment of u, fulu) = 0, if not. A Bernoulli 
sequence (with repect to a probability distribution P(a;) = pj, j =1,... ,1, 


L = {ay,... ,aj}) is a sequence x € L” such that limpo ¥n(a;;2) = pj, J = 
1,... ,2, and for all words w € L* 
lim: (aj; Cet) = pj, j = ly. 5h, (8.1) 


where 1,(a;; Pw) is the relative frequency of occurrence of the label a; in 
the initial segment of length n of the sequence ¢, x. If L = {0,1} is the 
binary alphabet and P,(1) = p, P,(0) = 1 — p, then (8.1) has the form 


n 


lim x X (bw); =p 


n— oo n 


j=1 


for all words w € {0, 1}*. 

The sets of Bernoulli place selections and sequences are denoted by sym- 
bols Ug and Xp, respectively. 

A. Church [19] suggested to consider the set Udy, of place selections which 
are generated by total recursive functions f : L* — {0,1} (functions which 
can be computed by using algorithms). Church’s collectives (random se- 
quences) are sequences x € L® which satisfy the principle of the statistical 
stabilization and the principle of randomness for the set of place selections 
Ucn. Denote the set of Church’s collectives by the symbol Xcp. 

Both the sets Ug and Ucn are countable. The existence of Bernoulli 
sequences and Church’s collectives is a consequence of the general result of 
A. Wald [110]. 

Let p = (p;) : p; = P(a;), j = 1,...,1, L = {ay,..., a}, be a proba- 
bility distribution on the label set L. Let U be a set of place selections. We 
set 


X(U,p)={x €L®: VP EU lim mlaj; ox) = pj, j =1,... , Uf. 


Theorem 8.1. (Wald) For any countable set U of place selections and 
any probability distribution p on the label set L, the set of sequences X (U, p) 
has the cardinality of the continuum. 
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Thus at least for countable sets of place selections U Mises’ frequency 
theory of probability can be developed on the mathematical level of rigor- 
ousness. R. von Mises was completely satisfied by this situation (see [88]). 
However, he was strongly against the idea to fix once and for all a set of 
place selections. By Mises the concrete set of place selections is determined 
by a physical problem. But mathematicians prefer to consider fixed classes 
of place selections. In particular, the large part of mathematical community 
consider Church’s choice as the most reasonable. The author does not think 
that the choice of total recursive functions as place selections can be justified 
by some physical arguments. The idea that reality which can be studied by 
human mind can be reduced to reality produced by Turing machines looks 
rather primitive in the light of modern investigations of the processes of 
thinking. It seems that the brain uses transformations ¢ : L* — L* which 
based on non-recursive functions, [65], [66]. 

2. Geometric and frequency spaces. According to the modern ideol- 
ogy of geometry, geometric model is a pair (X, G), where X is a set of points 
and G is a group of transformations of X. Such an approach is closely con- 
nected with von Mises’ approach to probability theory. Here we have a system 
of place selections U (which plays the role of a group of transformations G) 
and the space X(U, p) of ‘probabilistic points’. The pair (X (WU, p),U/) can be 
called a frequency probability model. Moreover, as in geometry, we have to 
consider some algebraic structure on the system of transformations U. We 
shall demonstrate that we have to use semigroups (with unit) of transforma- 
tions U. 

Let U be a system of place selections containing the identity transfor- 
mation. If £ € X(U,p), it is natural to assume that, for each ¢ € U, y = 
gx € X(U,p) : each element ¢ of U transforms an U-collective z in a new 
U-collective. Thus, for each % € U, the sequence z = wy = y o or satisfies 
the principle of the statistical stabilization. Let f = y o ġ ¢ U. Then we can 
extend the system of transformations U by setting Y = UU{f}. It is evident 
that (under our assumption) the set of points X (U/’, p) coincides with the set 
X(U,p). Therefore it would be natural to assume from the beginning that U 
is a semigroup. 

One of nice examples of frequency probability spaces is the space (Xon, Ucn) 
based on the system of totally recursive functions. 

3. Ville’s objection. Although Wald’s reformulation of von Mises’ 
ideas solved the problem of consistency, it lead to an objection of entirely 
different kind. 
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Theorem 8.2. (Ville, [106]) Let L = {0,1} and let U = {¢,}°, be a 
countable set of place selections. Then there exists x € L® such that 


N 
tase Oak 1 
(a) for alln, jim, N dn); =z 


2 


1 1 
(b) for all N, ș N (hs); > 7 


j=1 


Such an z is a collective with respect to U (x € X(U,1/2)), but seems 
to be far too regular to be called random. Formally, x’s with property (b) 
form a set of Lebesgue measure 0 (this is a consequence of the law of iterated 
logarithm). 

4. Ensemble probability approach to randomness. Ville and 
Frechet used Theorem 8.2 to argue that collectives in the sense of von Mises 
and Wald do not necessarily satisfy all intuitively required properties of ran- 
domness. Ville introduced a new way of characterizing random sequences, 
based on the following idea: a random sequence should satisfy all properties 
of probability one. Strictly speaking, this is of course impossible: we have to 
choose countably many from among those properties. It must be underlined 
that Ville’s idea is really completely foreign to von Mises. For von Mises, a 
collective x € L” induces a probability on the set of labels L, not on the set 
of all sequences L. Hence there is no connection at all between properties 
of probability one in L and properties of individual collectives. 

Per Martin-Lof [83]-[85] proposed to consider ‘recursive properties of 
probability one’ (i.e., properties which can be tested with the aid of algo- 
rithms). Such an approach induces the fruitful theory of recursive tests for 
randomness (see, for example, [79], [112]). Similar approach was developed 
by Schnorr [98]. We underline that approaches of Martin-Léf and Schnorr 
(as well as Ville and Frechet) have nothing to do with the justification of 
Mises’ frequency probability theory. 

5. Kolmogorov Complexity. A.N. Kolmogorov tried to find founda- 
tions of randomness by reducing this notion to the notion of complexity. Let 
L = {0,1} and z € L*. 

Definition 8.1. (Kolmogorov) Let A be an arbitrary algorithm. The 
complexity of a word x with respect to A is Ka(x) = min l(r), where {r} are 
the programs which are able to realize the word x with the aid of A. 
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Here I(r) denotes the length of a program r. This definition depends on 
the structure of an algorithm A. Later Kolmogorov proved the following 
theorem: 

Theorem 8.3. There exists an algorithm Ao (optimal algorithm) such 
that 

Kag(a) < Kalz) (8.2) 


for every algorithm A. 

As usual, (8.2) means that there exists a constant C such that Ka (£) < 
Ka(z)+C for all words z. An optimal algorithm Apo is not unique. 

Definition 8.2. The complexity K(x) of the word x is equal to the com- 
plexity K4, with respect to one fixed (for all considerations) optimal algorithm 
Apo. 
The original idea of Kolmogorov [76], [77] was that complexity K (x1:n) 
of initial segments zı:n of a random sequence z has to have the asymptotic 
K (fin) ~ n,n — oœ, i.e., we might not find a short code for 2;.,. However, 
this nice idea was rejected due to an objection of Per Martin Löf [85]. To 
discuss this objection and connection of Kolmogorov complexity with Mar- 
tin Löf randomness, it is better to use conditional Kolmogorov complexity 
K(x; n) instead of complexity K(x). Complexity K,4(x;n) is defined as the 
length of a minimal program m which produces the output z on the basis of 
information that the length of the output z is equal to n. 

Theorem 8.4. Let f be a total recursive function such that} ~~, 2740) = 
co. Then, for every sequence x, K(£in;n) <n -— f(n) for infinitely many n. 

In particular, we can choose f(n) = log, n. Thus, for any binary sequence 
T, K(t1nj3n) <n -— log, n for infinitely many n. Hence ‘Kolmogorov random 
sequences’ do not exit. 

P. Martin-Löf obtained also an estimate of K (£1:n; n) from below: 

Theorem 8.5. Let f be such that °°, 2-4™ < oo. Then, with proba- 
bility one, K(2i.n;n) > n— f(n) for all but finitely many n. 

In particular, we can choose f(n) = 2log,n. Thus, for almost all binary 
sequences T, K (zim; n) > n — 2log,n for all but finitely many n. Therefore 
for almost all binary sequences Kolmogorov complexity (n) = K (zrn; n) 
oscillates between graphs of the functions gmax(n) = n and gmin(n) = n — 
2log.n (with finitely many intersections with gmin(n)). The graph of the 
function ¢(n) has infinitely many intersections with the graph of the function 
fmia(7) = log, n. 

The following two theorems [85] give the connection between high Kol- 


Foundations of Probability Theory 47 


mogorov complexity (for infinitely many initial segments) and Martin-Löf 
randomness: 

Theorem 8.6. Let f be a total recursive function such that >>, Q-f(n) 
is recursively convergent. Then, if x is random in the sense of Martin-Lof, 
then K (Tinin) > n— f(n) for all but finitely many n. 

That X; 2-f(") is recursively convergent means that there is a recursive 
sequence ni, No, ..., Nk,- such that 


fo) 
pe i ome eA ee 


Nm+1 


Theorem 8.7. If there exists a constant c such that K (Tin; n) > n—c for 
infinitely many n, then the sequence x is random in the sense of Martin-Löf. 
Critical Remarks: 

1) Despite of the great success of Kolmogorov and Martin-L6f approaches, 
it is doubtful that these approaches provide the adequate description of ran- 
domness in physical reality. The main objection is against the use of recursive 
functions (algorithms). On one hand, there are no reasons to suppose that 
random sequences produced by physical phenomena must pass all recursive 
tests for randomness (even the law of large numbers). On the other hand, 
‘randomness’ of such sequences may be characterized by some systems of 
non-recursive transformations. 

2) It seems impossible to reduce Martin-Lof randomness to Mises’ ran- 
domness. Denote the class of Martin-Lof random sequences (with respect 
to the uniform distribution) by the symbol RM. The reduction of Martin- 
Lof randomness to Mises’ randomness must be given by the equality RM = 
X(U, 1/2) for some class U of place selections. However, it seems impossible 
to find such a class Y. For example, let U = Ucn be the class (semigroup) of 
Church place selections. Then, as each ¢ € Uon gives a recursive property of 
probability one, we have RM C X(U,1/2). Ville’s result, combined with the 
observation that the Martin-Lof random sequences satisfy the law of the iter- 
ated logarithm, shows that the inclusion is strict. Moreover, it can be shown 
(see M. van Lambalgen {103 ]) that the set of sequences X(U,1/2) \ RM is 
rather large. 

Therefore approaches of Martin-Lof-Ville-Frechet and von Mises give to- 
tally different viewpoints to the notion of randomness. The first approach 
is based on the ensemble interpretation of probability and the second ap- 
proach is based on the frequency interpretation of probability. As we have 


48 Chapter 1 


already noticed, these interpretations could not be unified in one (mixed) 
ensemble-frequency interpretation. 


9 Operation of combining of collectives 


In the three basic operations discussed in section 3, one single collective x 
served each time as point of departure for the construction of a new collective. 
We consider the problem of combining of two or more given collectives. We 
start with S-sequences (sequences which satisfy the principle of the statistical 
stabilization). 

Let z = (xj) and y = (yj) be two S-sequences with label sets L, and Ly, 
respectively. We define a new sequence 


z= (zj), zj = (£j, Y5). (9.1) 


(in general z is not an S-sequence with respect to the label set L, = Lg x Ly). 
Let a € L; and b € L}. Among the first N elements of z there are ny (a; z) 
elements with the first component equal to a. As ny(a;z) = ny(a; z) 
is a number of z; = a among the first N elements of z, we obtain that 
limy nudez) = P,(a). Among these ny(a; z) elements, there are a num- 
ber, say ny (b/a; z) whose second component is equal to b. The frequency 
vy (a,b; z) of elements of the sequence z labeled (a, b) will then be 


ny (b/a; z) = nn (b/a; z) ny (a; z) 
N mny(a;z) N ` 


We set vy(b/a;z) = rua, Let us assume that, for each a € Ly, the 
subsequence y(a) of y which is obtained by choosing y; such that z; = a is 


an S-sequence®. Then, for each a € L,, b € Ly, there exists 
P,(b/a) = Nim, vy (b/a; z) = Jim, uy (b; y(a)) = Pya) (b). 


We have 
X P.(b/a) = 1. (9.2) 


beL2 
The existence of P,(b/a) implies the existence of P,(a, b) = limy—>o vy (a, b; z). 
Moreover, we have 


P,(a,b) = P,(a)P,(b/a) (9.3) 


8In general such a choice of the subsequence y(a) of y is not a place selection. 
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and P,(b/a) = P,(a,b)/P,(a), if P,(a) # 0. By (9.2) and (9.3) we obtain 


5 5y P,(a,b) =1. 


aELa bEL2 


Thus in this case the sequence z is an S-sequence with the probability distri- 
bution P,(a,b) defined by (9.3). The S-sequence y is said to be combinable 
with the S-sequence X. This relation is denoted by fy. The relation of 
combining is a symmetric relation on the set of pears of S-sequences with 
strictly positive probability distributions. To show this, we write 


vy (a,b; z) 
vy(b; z) ° 


a E€ Lz, b € Ly. If £y and P,(a) > 0, P,(b) > 0, a € Lz, b € Ly, then, for 
each b € Ly, a € Lz, there exists 


vy (a/b; z) = 


P,(a/b) = im, Un (a/b; z) = ae = a 


Thus we obtain that 9. On the other hand, if, for example, P,(b) = 0 and 
P,(a,b) = 0, then in principle vy (a/b; z) may fluctuate. In that case x is not 
combinable with y. The previous considerations can be summarized as the 
following proposition. 

Proposition 9.1. Let x and y be two S-sequences with strictly positive 
probability distributions. Then the following conditions are equivalent: 1) Ty; 
2) yz; 3) the sequence z defined by (9.1) is an S-sequence. 

If Jz and zy, then x and y are said to be combinable. This relation is 
denoted by zy. If £y, then the conditional probabilities P,(b/a) are well 
defined even if P,(a) = 0. Typically such probabilities do not play any role 
in probabilistic considerations. We say that y is (mod P,.)—-combinable with 
x, xy (mod P,), if, for each a € L,, Pz(a) > 0, the sequence y(a) is an 
S-sequence. By Proposition 9.1 we have ¥Z (mod P,) = z is an S-sequence 
<> Ey (mod Py). Thus we need not use the arrow to denote this relation of 
combining. This relation will be denoted as zy (mod P,). 

We introduce the operation of combining for collectives. We start with 
some preliminary considerations on place selections. Let x = (a;), 2; € Lz, 
and y = (y;), yy E Ly, be two arbitrary sequences and let z = (zj), z; = 
(z; yj). Let ©, ®2 and G be some systems of place selections operated in 
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x, y and z, respectively. Let ¢ belongs to G : $2 = (Zm, Znz: =- Znp oe) 
We set dx = (a,,) and Py = (yn,). It should be noticed that in general 
@ and ¢® are not place selections in x and y, respectively’. We set Gi = 
{f =o :¢ € G}, Ga = {g9 = 6 : 6 E G}. Let x = (a;) and y = (y;) 
be ®, and ®,; collectives, respectively. Let G be a system of place selections 
operated in z = (z;), z; = (£j, yj), such that, ®, C Gi and ® C Gz. If 
x and y are mod P-combinable as S-sequences, then they are said to be 
(mod P,G)-combinable collectives if: (1) the limits P,(a), a € Ls, and 
P,(b), b € Ly, are insensitive to transformations belonging to G; and Go, 
respectively; (2) the limits P,(b/a), P,(a) > 0, and P,(a/b), Py(b) > 0, are 
insensitive to place selections belonging to G. We can easily prove that x and 
y are (mod P, G)-combinable collectives iff z is the G-collective. 

Proof. 1) Let x and y be (mod P,G)-combinable. Let ¢ € G. For 
P,(a) > 0, we have: 


P4-(a,b) = Jim vyla, b; ġz) = ym vy(b/a; 6z)vn (a; bz) 
= slim uy (b/a; ¢z)vy(a; ex) = P,(b/a)P,(a) = P,(a,b). 
For P,,(a) = 0 we have: vy(a,b; 62) < vy(a; oz) = vy (a; dz). But 
sim vy(a; ox) = slim vy(a; x) = P,(a) = 0. 


Thus Py,(a,b) = 0 = P,(a, b). 
2) Let z be the G-collective. Then we obtain 


Pyare(@) = X. Poe(a,6) = X P.(a,b) = P3 (a). 


beLy beLy 
In the same way we obtain that P42),(b) = P,(b). Finally, we have 
Poz(a/b) = Po-(a, b)/Pows(a) = P2(a, b)/P2(a)) = P.(/a), 


for P,(a) > 0. 
Let ¢ be defined by a function f = (fi, fo(z1), fa(z1, 22),.-),f; = 0,1. If 


fn(21, 22,---)2n-1) = 1, then the element z, is chosen for a new sequence. Let x = 
(£1, T2, ...,0n,...) be a sequence and let y = (£m, 2m41,...),m > 1. Here z = (z;) has the 
form: z1 = (£1, £m), Z2 = (£2,%m41),... . Thus, in particular, fo(z1) depends (in general) 


not only on zı but also on £m. Therefore $“) is not a place selection for z. 
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The reader can easily define for the collectives x and y the relations Jz, Ty 
and zy with respect to the G which are not based, respectively, on mod P,, 
mod P, and mod P factorizations. In general 9% does not imply £y or that 
the sequence z is the G-collective (and vice versa). 

Probabilities P,(a/b), P.(b/a), a € Lz, b € Ly, have the meaning of 
conditional probabilities. If S, and S, are statistical experiments which 
generate z and y, respectively, then, for example, P,(b/a) is nothing other 
than the conditional probability of the result b for S, if we knew that a was 
the result of S,. It is easy to show that probabilities P,(a, b) (or P,(a/b)) can 
be obtained on the basis of the general definition of conditional probabilities 
based on the operation of partition. For each a E€ L,, we consider the set 
Aq = {u = (a,b), b € Ly} C Lr x Ly and the point set Bap = (a,b) C Ag. 
Let z be an S-sequence (in particular, a collective). It easy to see that the 
conditional probability P(B,»/Aq) (for P(A.) > 0) defined on the basis of the 
operation of partition for z coincides with the probability P,(b/a). However, 
the approach based on the operation of combining seems more attractive 
than the approach based on the operation of partition. In the first case 
the conditional probabilities have the natural interpretation as a measure of 
dependence between collectives x and y. 


10 Independence of collectives 


Let x and y be S-sequences and let yz. The y is said to be independent from 
x if all S-sequences y(a), a E€ Ly, have the same probability distribution 
which coincides with the probability distribution P, of y. This implies that 


P,(b/a) = lim vy(b/a;z) = lim vy(b; y(a)) 


Hence 
P,(a,b) = P,(a)P,(d), a E Lz, b E€ Ly. (10.1) 


Thus the independence implies the factorization of the two dimensional prob- 
ability P,(a,b). However, in general the multiplication rule (10.1) does not 
imply independence. If (10.1) holds, but P,(a) = 0, then in principle P,(b/a) 
may depend on a (or it may be that P,(b/a) = Const # P,(b)). By similar 
reasons the condition “y is independent from x” does not imply that x is inde- 
pendent from y. Dependence on a such that P,(a) = 0 (or b, P,(6) = 0) does 
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not play any role in probabilistic considerations !°. Therefore it is natural to 
consider (mod P)- independence. 

Let z and y be two (mod P)-combinable S-sequences (or collectives). 
They are said to be (mod P)-independent if (a) Pya) = P, for all a € Ly, 
P,(a) > 0 and (b) P,@) = Pz for all b € Ly, P,(b) > 0. In fact, (a) implies 
(b) and vice versa. For instance, let (a) take place. Then for P,(b) > 0, we 
have 


P,(a/b) = P.(a, b)/Py(b) = P.(b/a)P2(a)/P,(6) 


= Pya)(B)P2(a)/P,(b) = P; (a). 


It is evident that the multiplication rule (10.1) holds for (mod P)-independent 
sequences. On the other hand, if 7y(mod P) and the multiplication rule 
(10.1) hold, then z and y are (mod P)-independent. 

Remark 10.1. The reader can easily generalize the frequency approach 
to conditional probabilities and independence to countable sets of labels. Non- 
countable sets of labels were considered in [102]. 

Example 10.1. Assume that two coins are tossed simultaneously, the cor- 
responding sequences being x and y (with Ly = Ly = {0,1}). Our experience 
says that in mathematical models we can assume that x and y are collectives with 
probability distributions P,(a), Py(b), a,b = 0,1. We choose two subsequences 
of y: (1) y(0) = (y), where, for the first coin, z; = 0; (2) y(1) = (y;,), where, 
for the first coin, z; = 1. Our experience says that in the mathematical model 
(for ordinary coins) we can assume that there exist P(b/0) = limy—+00. vw (b; y(0)) 
and P(b/1) = limy—oo vn (b; y(1)), b = 0,1. If tossing of the second coin does not 
depend in any way on the tossing of the first coin, then the relative frequencies in 
y(0) and y(1) have the same behaviour as relative frequencies in y (this is again the 
experimental fact). Thus we can assume that collectives xz and y are independent. 

Example 10.2. Assume that an urn contains balls each marked with a 
number a, where a belongs to the set S = {a1,... ,an}. The sequence x is induced 
by the experiment S,: we draw a ball from the urn, write its label and return 
it into the urn. The sequence y is induced by the experiment S,: after drawing 
the first ball and before returning it, a second ball is drawn from the urn and 
its label is written. As usual, we define subsequences y(a;), j = 1,...,n, of y. 
Our experience says that in the mathematical model we can assume that z, y and 
y(a;) are collectives and z and y are combinable. Thus the conditional probabilities 


10Of course, we could not completely exclude the possibility that there may exist physical 
phenomena in that the dependence on labels having zero probabilities plays some rule. 
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P(b;/a;) = Pyia,)(bj) are well defined. However, if the distribution of balls in the 
urn is not symmetric, then P(b;/a) depends on a. Thus the collectives x and y 
are not independent. 


11 Frequency and measure - theoretical view- 
points on independence 


If we use the frequency approach and take combining as our starting point, 
then the mathematical and physical conditions for independence concern 
the interconnection of the two one-dimensional collectives z and y (two sta- 
tistical experiments S, and S,) or in terms of P,(a,b), the type of this 
two-dimensional distribution: factorization (10.1); they are not concerned 
with properties on each single collective (statistical experiment). On the 
other hand, measure-theoretical definition (4.1), (4.2) of independence does 
not relate in any way to two-dimentional distribution. Of course, definition 
(4.1), (4.2) can be considered as a generalization of the factorization rule 
(10.1). However, (4.1), (4.2) extends (10.1) too much. In general (4.1), 
(4.2) has no relation with original physical motivations of independence. 
We wish to consider this problem carefully. Consider the following exam- 
ple [88]: The label space S' consists of six points 1,... ,6 with distribution 
pi, i = 1,... ,6; the event (or set) A consists of the three points 2,3,4, the 
event C of the two points 1,2; the intersection A N C is the point 2, and 
P(C/A) = po/(pe + p3 + ps) (due to measure-theoretical definition (2.4) or 
frequency definition (3.4) which is based on the operation of partition). Now 
the following question is asked: Under what conditions is P(C/A) equal to 
P(C) (or P(CNA) = P(A)P(C))? In our example po/(p2+p3+ps) = pı +p2? 
The example is so chosen that this is true for p; = 1/6, i = 1,... ,6. The 
statement is then made that, in this case, the events A and C are indepen- 
dent. Let us analyze this statement. 

Let us consider a set A consisting of the points 2, 3,4 and a set C of point 
2; here C C A. Then P(2/A) = ~/(po+p3+p4). Here P(2/A) certainly does 
not remain unchanged if we vary the set A, and certainty for no A, P(A) #1, 
is P(2/A) equal to po. Now, however, in order to make such an equality 
possible, one consider other sets, C, such that ANC = {2} but CD ANC. 
Such subsets of S are, for example, C1 = {1,2}, Cz = {2,5}, C3 = {1, 2, 5, 6}, 
C4 = {1,2,5}. Then, for each of these C;, P(C;/A) = p2/(p2+pst+pa). Thus, 
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having the choice of sets C; one may ask whether for one or more of them, 
and with some given distribution, P(C/A) = P(C). If all p; = 1/6, this 
holds true for Cı = {1,2} or for Cz = {2,5} but not for Cz or Cy. If we 
take pı = ps = 1/12, po = ps = pa = 1/6, pẹ = 1/3, then the above 
equality holds for C4 = {1,2,5} but no longer for Cı and C}, and so on. It 
seems that the measure-theoretical definition allows the possibility to purely 
numerical accidents. From the physical point of view, it is not clear: What 
is the meaning of the statement that, for a given distribution “the events 
A = {2,3,4} and C = {2,5} are independent” while “the events {2,3,4} 
and {1,2,5} are dependent” or “events {1,6} and {2,3,4} are dependent”? 

One may say that the intersection of two sets A and C has the ‘property’ 
of belonging to A and the ‘property’ of belonging to C (and many others). 
Nevertheless, the label ”2”- the result of the ordinary tossing of one die - 
is not a two-dimensional label like “blond hair, blue eyes” or “first die 3, 
second die 5”. Therefore a concept of independence of two ‘properties’ which 
may or may not influence each other is meaningful. However, this concept 
must be discussed on the basis of the procedure of combining of collectives 
corresponding to measurements of these properties. 

Conclusion. Independence should be defined for collectives rather than 
for isolated events. 


12 Generalization of the operation of com- 
bining 


In fact, to consider the relation of combining Ty we need not start with two 
collectives (or S-sequences) x and y. It suffices to have one collective x and 
a family {y(a)}aez, of collectives having the same label set Ly. We denote 
the system (x, {y(a) }acr,) by the symbol Ury. In this framework we can also 
define conditional probabilities Py,,(b/a) = Pya)(b), a € Ly, b € Ly, and 
two-dimensional probability distribution 


Pu., (a,b) = Py,,(b/a)Pz(a), a € Le, b € Ly. 


In fact, a sequence z = (z;) corresponding to measurements of pears z = 
(£j, yj) may be not defined. Such a situation is common for measurements of 
so called incompatible observables in quantum mechanics (i.e., observables 
represented by noncommuting operators), see Chapter 2. In that case it is 
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impossible to perform a simultaneous measurement of two observables x and 
y (i.e., we could not form the collective z). Nevertheless, we could speak 
about two properties A and B of the physical system. The conditional prob- 
ability Py,,(6/a) has the following meaning: if the result of a measurement 
of the property A is equal a, then the probability to obtain the value b of the 
property B is equal Py,, (b/a). 

We suppose now that it is also possible to perform a measurement of the 
property B and, for each B = b, to perform a measurement of the property 
A. Mathematically such measurements are described by a collective y (cor- 
responding to a measurement of B) and a system {z(b)}yex, of collectives 
(corresponding to measurements of A under the condition B = b). Thus we 
have the system Uy, = (y, {t(b)}oez,). Here we can also define the condi- 
tional probabilities Py,,(b/a) and two-dimensional probability distribution 


Pu, (b,a) = Py,,(a/b)Py(b), a € Ly, b € Ly. 


It may be that Puy, (a,b) # Pu, (b,a). In such a case the two-dimensional 
probability distribution P (a, b) corresponding to pears (A = a, B = b) does 
not exist. 


13 Comparative probability 


All probability models discussed in the previous sections are called quantita- 
tive probability models. Terse the quantitative statement “P(A) = p” read 
“the probability of A is p” is the basis of these theories'!. On the other hand, 
the modal or classificatory statement “A is probable” or “A is likely” seems 
to be most common in ordinary discourse. To formalize such an approach, 
we can consider, for example, a binary relation P, in the set D x D, where 
D is the set of events. This relation can be read as follows: If (A, B) € R, 
then A is at least as probable as B, A > B. Such a formalization gives so 
called comparative probability formalism (see, for example, T. Fine [39]). 

Comparative probability induces more extended class of probability mod- 
els (with larger domains of application) than quantitative probability. 

For example, having observed that 10 tosses of a strange coin resulted 
in 7 heads, we are more justified in asserting that “heads are more proba- 
ble than tails” then asserting that “the probability of heads is 0.7”. There 


There arises natural question Why do we consider only real numbers p as quantities? 
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exist relatively simple mathematical models in that we consider to be valid 
comparative probability statements that, are incompatible with any repre- 
sentation in quantitative theory’. 

However, my opinion is that comparative probability models have to be 
considered as “derivatives” of the three fundamental models (classical, fre- 
quency and ensemble). To define the binary relation A, we need to use one 
of fundamental models (or their generalizations). 

Typically it is assumed that the binary relation P, satisfies the following 
axioms (see T. Fine [39], p.17): 


CO. (Nontriviality) Q > Ø, where @ is the null or empty set. 

C1. (Comparability) A= B or B= A. 

C2. (Transitivity) A> B, BX C> AJC. 

C3. (Improbability of impossibility) A = 0. 

C4. (Disjoint unions) AN (BUC) =0=> (BZCeAUBZ AUC). 


Axiom Cl and C2 establish that the relation % is a linear complete order, 
The requirement that all events be comparable is not insignificant and as 
been denied by some authors [39]. To illustrate the latter possibility, we 
consider the following example. There is an ensemble S,|S| = N, of coins 
having different centers of mass. The first coin tossing experiment (for all 
coins s € S) gave N; heads and the second experiment give M, heads. If 
N, > M = N — N, but No < My = N — No, then we cannot assert neither 
“heads are least as probable as tails” (A = B) nor “tails are at least as 
probable as heads”. 

In Chapter 4 we construct a quantitative probability model (with the 
field of p-adic numbers as quantitative space) that induces a comparative 
probability model in that there exist noncomparable events. In this model 
the axiom (C4) is also violated. 

Remark 13.1. (Subjective probability as comparative probability). It 
seems that the comparative interpretation is the one of possible interpretation 
of subjective probability. We remark that is does not sound reasonable to 
use the fixed ordered set (the segment [0, 1]) for quantitative representation 
of subjective probabilities. The use of [0,1] is the root of misunderstandings 


12 At least if R is used as a “quantitative space”. 
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related to subjective probability. This implies that numbers p € [0,1] are 
often interpreted as frequency of ensemble probabilities. 
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Chapter 2 


Quantum probabilities 


In this Chapter probability interpretations of quantum mechanics will be dis- 
cussed. The main attention will be paid to comparative analysis of statistical 
measurements for classical and quantum physical systems. We shall see that 
there is (at least formally) a difference in statistical behaviors of classical and 
quantum physical systems. We present a purely probabilistic explanation of 
this difference. 


1 Classical and quantum probability rules 


1. Properties of physical systems. The notion of a property of a physical 
system! will play the important role in our analysis. Therefore we shall 
discuss this notion. In this discussion we shall use not only physical but also 
philosophical arguments. The reader who is not interested in such general 
discussions may start directly with a probability interpretation of quantum 
states (see subsection 2). 

Before to start the discussion, we consider following simple examples of 
physical properties. 

Example 1.1. Let S be an ensemble of rigid bodies. Suppose that these 
bodies have one of two colours, black or white, and one of two forms, ball 
or cube. The colour and form are properties of s € S. Numerically these 


lTn fact, the notion of a property of a physical system is (more or less) equivalent to 
the standard notion of a physical observable. However, an observable must be connected 
with some observation (measurement). On the other hand, we wish to consider properties 
of physical systems which in principle are not related to measurements. 
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properties, A and B, can be described by quantities A = 0,1 for black and 
white bodies, respectively, and B = 0,1 for ball and cube, respectively. 

Example 1.2. Let a be a particle (classical or quantum). The position 
q and momentum p of a are properties of a. Numerically these properties 
are described by continuous spectrum of values (by the field of real numbers 
R). In what follows we shall mainly study properties which are described by 
discrete spectra of values. In the case of the position and momentum we can 
make the following discretization. Let D, and D, be domains in R*. We set 
A=lifqe D, and A =0ifq g D; B=1 if p € Dp and B =0 if p ¢ Dp 
The quantities A and B are properties of a. In fact, the following question 
will be studied in this section: What is a difference in statistical behaviours 
of the properties A and B for classical and quantum particles? 

In the physical community there is no the unique point of view to the 
notion of a property of a quantum physical system (see, for example, [36], 
[43], [109], [7], [8], [10-112], [13], [14], [27], [28], [33] [47], [24], [46], [29], [30], 
[81] for the details). Some scientists keep to realism. They assume that a 
property is an objective characteristic of a quantum system. Thus a property 
is a property of the object. Such a property is not related to the act of a 
measurement. It does not depend on a subjective observer. In particular, 
adherents of realism (L. de Broglie, A. Einstein, D. Bohm,...) think that 
both classical and quantum particles have well defined (localized) positions 
and momentums. Adherents of realism can be split in two subgroups. This 
splitting is based on two different viewpoints to the following problem. Does 
the quantum formalism operate with initial values of properties (i.e., values 
before acts of measurements) or final values of properties (i.e., values after 
acts of measurements)? This problem is very important in quantum mechan- 
ics, because here a measurement can change crucially values of properties of 
physical systems. We shall call adherents of the initial values hypothesis 
i-realists and adherents of the final values hypothesis f-realists’. 

Other part of the physical community supports the ideas of empiricism. 
They assume that a quantum formalism does not describe microreality as 
such. Properties obtained via quantum measurements are not properties 
of quantum systems (not properties of the object). They are merely prop- 
erties of measurement phenomena (properties of instruments and physical 


2For example, any measurement of the position of a quantum particle should change 
the localization of this particle. -realists suppose that a quantum measurement gives the 
initial value of the position, q = q;; f-realists suppose that a quantum measurement gives 
the final value of the position, q = qf. 
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circumstances in that these instruments are used). In particular, adherents 
of empiricism (N. Bohr, W. Heisenberg, J. von Neumann,...) claim that posi- 
tions and momentums of quantum particles are not objective. For example, 
an electron has no definite position before the act of a measurement. It is 
just a ‘cloud’ which is collapsed into a point by the act of a measurement. 
Many adherents of empiricism think that a property depends not only on a 
measurement procedure M, but also on so called preparation procedure €E 
(see section 2 for the details) which is used to prepare quantum systems for 
acts of measurements. We call the adherents of the latter viewpoint €/M- 
empiricists. 

Example 1.3. (Nonobjective property) Let us keep to €/M-empiricism. 
As cats cannot fly, the speed of flying v cannot be considered as an objective 
property of the cat. We consider the following procedure € to prepare cats to 
fly: each cat is located near the desk of an airplane which has a system of the 
autopilot; by manipulations with buttons a cat can change the speed v of the 
airplane. A large statistical ensemble S of cats in airplanes is prepared by €. 
A measurement procedure M is a measurement of the speed v of an airplane 
with the cat. The v is not a property of the cat (on the other hand, it is not 
just a property of the airplane). It is merely a property of the preparation 
procedure £. If cats can choose only a finite set of speeds u, ..., Ug, then the 
measurement M will produce discrete probability distribution P(v = v;), i = 
Tid iski 

Empiricism is often identified with idealism. By idealists viewpoint 
quantum systems have no objective properties at all. This approach im- 
mediately implies a death of reality (not only reality of the microworld, but 
also reality of macroworld which is composed of microsystems). However, 
in principle empiricism need not imply idealism. It is very well possible to 
believe in the objective existence of atoms and electrons without being com- 
mitted to the thesis that this reality is described by the quantum mechanical 
formalism. 

The realist philosophy is very attractive for scientists working in classi- 
cal physics. However, we shall see that the realist viewpoint induces some 
problems (so called Einstein-Podolsky-Rosen paradox) in the foundations of 
quantum physics*. The empiricists approach seems to be free of such prob- 
lems. However, empiricism is not so attractive as the philosophic basis for the 


3In fact, we shall show that these problems have purely mathematical origin and they 
are connected only with the foundations of probability theory. 
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investigation of reality. If we even do not keep to idealism and not deny exis- 
tence of objective reality (which is independent to our observations), then by 
the empiricists ideology we still have to assume that the quantum formalism 
describes not objective reality of microworld, but only reality of equipment 
in our laboratories. 

Of course, such a classification of members of the physical community is 
not rigid. Some of them balance on the boundary between different view- 
points. 

We shall see that different viewpoints to the notion of a property im- 
ply different viewpoints to the notion of probability. For example, 7-realists 
can use both ensemble and frequency probability formalisms; f-realists and 
empiricists can use only frequency probability formalism. 

2. Probability interpretation of a quantum state. We discuss 
now a probability interpretation of quantum mechanics. We may restrict our 
considerations to two dimensional quantum systems. Already such quantum 
systems demonstrate all delicate features of this problem. Let us consider 
a large statistical ensemble S of quantum systems* s having two properties, 
A and B. Let H = C x C be the two dimensional complex linear space 
with the inner product (-,-) : H x H — C. In the quantum formalism the 
statistical ensemble S is described by a normalized vector ¢ € H (i.e, 
|||? = (¢,¢) = 1). This vector is called a quantum state. The properties 
(physical observables) A and B are described by symmetric operators A and 
Ê, respectively. Let ea = ($o, ¢1) and eg = (yo, %1) be two orthonormal 
bases in H consisting of eigenvectors of the operators A and Ê , respectively. 
The quantum state ¢ can be represented in two ways: 


$ = cofo + c11, where co, c1 € C, Jeol? + Jeal? = 1; (1.1) 


$ = doo + diy, where do, dı € C, |do|” + |di|? = 1. (1.2) 


By the probability interpretation of expansion (1.1) of the quantum state 
ġ the probability P(A = a)(= P4(A = a)) that s € S has the property 
A = a is equal to |c,|?. In the same way expansion (1.2) gives that P(B = 
B)(= P(B = B)) = |dg|?. The possibility to expand one basis with respect 


4Quantum theory is not yet understood as well as e.g. classical mechanics or special 
relativity. In particular, there is no precise definition of a quantum system. The unique 
way to extract ‘quantum domain’ from the classical world is to use statistical properties 
of quantum systems (which are, in fact, defined via these statistical properties). 
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to other basis induces connection between the probabilities P(A = a) and 
P(B = 8). Let us expend the vectors ¢ and ¢, with respect to the basis eg : 


Po = Uooo + uor, where uoa €E C, [uoo]? + juo]? =1; (1.3) 


P1 = WY + Ui, where ta € C, Juro)? + fur? = 1. (1.4) 


Thus do = cotton + c1U10, dı = Couo1 + C111 and we obtain the quantum rule 
for transformation of probabilities: 


P(B = 8) = |couog + augl’, B = 0,1. (1.5) 


3. Contradiction. On the other hand, by the probability interpretation 
of expansions (1.3), (1.4) we obtain that P(B = 8/A = a) = |uap|?. Indeed, 
in (1.3), (1.4) the quantum states ¢,,a = 0,1, describe statistical ensembles 
S(A = a) of physical systems which have the property A = a. Therefore 
the expansion of the a with respect to the basis eg gives corresponding 
probabilities for B = G (under the condition that A = a). By the formula of 
total probability we obtain: 


P(B = 8) = $7 P(A = a)P(B = B/A = a) = |co|?luogl? + fer? ler? 
a=0,1 

(1.6) 
Thus in general ‘quantum rule’ (1.5) differs from ‘classical rule’ (1.6). 
The standard viewpoint to the contradiction between (1.5) and (1.6) is that 
this is the exhibition of the non-Boolean structure (violation of Bayes’ for- 
mula) of quantum statistical ensembles. It is typically pointed out that this 
non-Boolean structure implies that it is impossible to consider a wave func- 
tion ¢@ as the description of a statistical ensemble S of identically prepared 
objects (not interacting with any prepared or measuring instruments). How- 
ever, careful analysis will show that this contradiction is a consequence of 
formal manipulations with Kolmogorov probabilities. On one hand, this con- 
tradiction need not imply the non-Boolean structure of quantum statistical 
ensembles and in principle we need not use a new ‘quantum probabilistic 
calculus’ for the description of quantum phenomena *. On the other hand, 


5QOne of the main aims of this book is to demonstrate that, in fact, there is no differ- 
ence between ‘quantum and classical probabilities’. Behaviour of probabilities related to 
so called quantum systems can be explained on the basis of classical probability theory 
(which is, of course, not reduced to Kolmogorov’s measure-theoretical formalism). As 
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we shall see that even non-Boolean structure (in fact, nonexistence of condi- 
tional probabilities and consequently the impossibility to use Bayes’ formula 
or the formula of total probability) need not imply that a statistical ensemble 
should have some ‘special quantum structure’. Such a non-Boolean structure 
can be easily found in classical statistical phenomena. 

Remark 1.1. The reader can easily understand that (1.5) differs from 
(1.6) only if the operators A and B do not commute. Observables (properties) 
A, B which are represented by noncommuting operators A, B are said to be 
incompatible. By the quantum formalism incompatible observables cannot 
be measured simultaneously. Hence, for incompatible, observables A, B the 
two dimensional probability distribution P(a, 8) = P(A = a, B = 8) cannot 
be defined on the basis of real physical measurements. 

To find roots of the contradiction between (1.5), (1.6), we have to analyse 
the meaning of probabilities in formulas (1.5), (1.6). We shall do this in 
section 3. 


2 Physical interpretations of the wave func- 
tion 


There are many different physical interpretations of the wave function’ ¢. 
We discuss the most important interpretations. 

1. Ensemble realist interpretation. Typically this interpretation is 
called a statistical interpretation (following to L. Ballentine, [8]). Here it 
is assumed that ¢ describes a statistical ensemble S of identically prepared 
objects s. Properties of s are their objective properties. On the basis of this 
interpretation, it is possible to keep to both 7-realism or f-realism. However, 
the main part of investigations on the basis of the statistical interpretation is 
based on 7-realism. It seems that, for example, A. Einstein was (more or less) 
an adherent of the statistical interpretation (in the i-realists framework). 


the operational definition of a quantum system is given by the description of statistical 
behaviour, the realization of such a program will imply that there is no crucial difference 
between classical and quantum physics. Therefore not only quantum systems may ex- 
hibit some ‘classical probabilistic features’, but also classical systems may exhibit some 
‘quantum probabilistic features’. In particular, quantum probabilistic behaviour can be 
exhibited by macrosystems. 

SOf course, such a proliferation of paradigms is characteristic of a crisis in the devel- 
opment of quantum theory, see [30], [33] for the details. 
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Here a wave function ¢ describes probabilistic distributions of properties 
of elements s of a statistical ensemble S. If we keep to 7-realism, then these are 
distributions of initial properties of these elements; if we keep to f-realism, 
then these are distributions of final properties (obtained via measurement) 
of these elements. 

2. Individual realists interpretation. Here it is assumed that ¢ 
describes not a statistical ensemble S, but an individual physical system s. 
Properties of s are considered as being objective. 

3. Ensemble empiricists interpretation. Here it is assumed that 
ġ describes a statistical ensemble which does not consist of quantum sys- 
tems, but it merely consists of preparation and measurement procedures (see 
section 2) in that these quantum systems are involved. Probabilistic distribu- 
tions which are related to this statistical ensemble are merely (see Example 
1.3) distributions of these preparation and measurement procedures. This 
interpretation is typically considered as a conventional interpretation of 
quantum mechanics. 

4. Individual empiricists interpretation. Here it is assumed that ¢ 
describes an individual quantum system s. Thus ¢ contains information on 
probabilistic distributions of possible reactions of s to acts of measurements. 
Many adherents of this interpretation keep to idealism and suppose that s 
has no objective properties at all. ‘Properties’ of s are connected only with 
specific acts of measurements. This extreme form of the individual empiri- 
cists interpretation is called an orthodox Copenhagen interpretation. 
Typically N. Bohr is mentioned as one of the main adherents of the orthodox 
Copenhagen interpretation. However, it seems that N. Bohr has no idealists 
views to quantum reality. He was merely an adherent of empiricism (and he 
balanced between the individual and ensemble interpretations). 

5. Individual interpretation and probability. We shall not consider 
in this book individual interpretations of ġ. It seems that the use of subjective 
probability is the only reasonable way to give the probabilistic foundation 
for these interpretations. However, it would be rather strange to use such 
an argument as a ‘measure of the personal belief’ as the cornerstone of the 
fundamental physical theory. As it has been mentioned in section 7, Chapter 
1, at the moment the only real possibility to justify the use of subjective 
probabilities is to reduce them to ensemble or frequency probabilities. Let 
@ describe the measure of our belief that, for example, the position q of an 
(individual) electron would be observed in a domain D. But how can this 
measure of our belief be found? The only way is to use our ensemble or 
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frequency experience’. 


Therefore we shall pay our attention to ensemble interpretations. These 
interpretations can be used on the basis of ensemble and frequency probabil- 
ity theories. Ensemble probability theory provides the basis of the statistical 
interpretation in the framework of i-realism. Frequency probability theory 
must be used as the basis of the statistical interpretation in the f-realists 
framework and the ensemble empiricists interpretation. 

Our main purpose is to demonstrate that the contradiction between the 
quantum and classical probabilistic rules (which is often reduced to the non- 
Boolean structure of quantum statistical ensembles) cannot be used as a rea- 
son in favour or against some interpretation of quantum mechanics. There- 
fore physicists have to solve their own problems and find new physical reasons 
to choose the right physical interpretation of the wave function. 

6. Preparation and measurement procedures. Many physicists 
imagine a quantum measurement process as it is split in two procedures: 

1) a preparation procedure E, 2) a measurement procedure M. 

A preparation procedure € produces an ensemble S of quantum systems s 
having the definite probability distribution of some property A. This property 
can be considered as an objective property of quantum systems s € S (re- 
alism) or as merely a property of the preparation procedure € (empiricism). 
In the later case probabilities P(A = a) are fixed by E. This statistical in- 
formation is algebraically coded in expansion (1.1) of the quantum state @ 
which represents the ensemble S. 

Probabilities P(A = a) have different meanings in different approaches 
to quantum mechanics. By the statistical (ensemble realists) interpretation 
in the i-realists framework P(A = a) = Ps(A; = a) is the distribution of 
initial values A; of the property A in the ensemble S. By the same statistical 
interpretation, but in the f-realists framework P(A = a) = P(A; = a) 
is the distribution of final values Ay of the property A in the collective 
a = (Q1,Q2,...,Q%,...) induced by measurements N of A for s € S. The 
measurement N is not performed, because it would disturb the result of 
the preparation procedure €. However, in principle it can be done (to test 
functioning of the preparation procedure €). If we keep to the ensem- 
ble empiricists interpretation, then probabilities P(A = a) have also the 


T Another possibility to deal with subjective probabilities is to consider them as mea- 
sures of information. However, even if probability is considered as subjective information, 
then it can be again reduced to ensemble or frequency probability for distribution of ideas 
in the brain. 
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frequency meaning. The only difference is that by the latter interpreta- 
tion values of A are considered as properties of the preparation procedure 
E: P(A =a) =P,(Ae = a). 

Typically the E is realized as a filter with respect to values of A. For 
example, to prepare the statistical ensembles S(A = a), a = 0, 1, (described 
by pure states a) we can use filters F, corresponding to fixed values® a of 
A. 

The next step is a measurement procedure M. The M is used for the 
measurement of other property? B for elements s of the ensemble S which has 
been prepared by E on the basis of the property A. The main feature of the 
quantum mechanical formalism is that theoretically the probability distribu- 
tion for B can be found on the basis of the purely algebraic computations 
via representation (1.2). 

Remark 2.1. The formalism of quantum mechanics is used in the follow- 
ing way. The preparation procedure € fixes probabilities Ps(A = a),a = 0,1. 
Thus we can found the absolute values of coefficients cy in expansion (1.1) which 
determine the quantum state ¢ representing the ensemble S. On the other hand, 
the probability formalism does not determine the phases 6, of the coefficients 
Ca = ea,/P5(A = a). These phases must be found on the basis of other phys- 
ical reasons. Then we can expand ¢ with respect to other basis eg = (po, %1) 
corresponding to fixed values of the property B and find coefficients dg of this 
expansion. The quantities |dg|? give theoretical values for probabilities P(B = 2). 
These theoretical values are compared with experimental values obtained on the 
basis of the measurement M for the observable B. 


3 ‘Contradiction’ between quantum and clas- 
sical probability calculi 


In this section we demonstrate that one of possible roots of the contradiction 
between quantum rule (1.5) (non-Boolean probability theory) and classical 
rule (1.6) (Boolean probability theory) is the identification of conditional 
probabilities P(B = 6/A = a) for the quantum state ¢, see (1.2), with 
probabilities P(B = 8) for the quantum states ¢,,a = 0,1, see (1.3), 


8Only particles with the property A = a can pass Fy. 

°The values of B obtained via M are considered (depending on a viewpoint) as B; 
(objective initial values), By (objective final values) or Be;m (values determined by the 
preparation procedure € and the measurement M). 
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(1.4). Another possible root is nonexistence of conditional probabilities 
P(B = 8/A = a). In the latter case theory is, of course, non-Boolean, 
but nevertheless it is still ‘classical’ (in the sense that such behaviour can be 
easily simulated for macrosystems). 

Thus the contradiction between quantum and classical rules (1.5) and 
(1.6) is a consequence of the formal use of Kolmogorov’s axiomatics. As we 
have already discussed, A.N. Kolmogorov eliminated the concrete structures 
of probabilistic spaces from his model. Thus there arose rather mystical sym- 
bol P of abstract probability which was not related to any concrete statistical 
ensemble S' or collective x. It seems that in quantum physics Kolmogorov’s 
abstract probabilities are often used formally. This implies identification of 
(conditional) probabilities which are related to different ensembles or collec- 
tives. However, such probabilities need not be equal. On the other hand, 
Kolmogorov’s definition of conditional probabilities via Bayes’ formula in- 
duces the opinion that (at least in ‘classical probability theory’) existence 
of probabilities P(E£;), j = 1,2, with P(E.) > 0 must automatically imply 
existence of conditional probability P(£)/ E2). However, such an assumption 
need not hold true for all statistical phenomena. In the ensemble probability 
framework (see section 2, Chapter 1) we need not assume that the family 
of all events F(as) (determined by the family ms of properties of elements 
s € S) is an algebra. Thus Ej, Ez E€ F(ag) need not imply E1 N Ez E F (ms). 
Here conditional probability P(£,/E2) could not be defined by Bayes’ for- 
mula. 

1. Ensemble approach: disturbance effects. Let us we follow i- 
realism. Both properties A and B are objective properties of elements of the 
statistical ensemble S represented by the quantum state ¢. Measurements 
give initial values of these properties, A = Æ, B = B;. We can consider sub- 
ensembles S(A = a) and S(B = 8), aœ, 8 = 0,1, of S which consist of elements 
s having the properties A = a and B = £, respectively. By the ensemble 
definition of probability Pg(A = a) = |S(A = a)|/|S| and Ps(B = 8) = 
|S(B = B)|/|S| and by the ensemble definition of the conditional probability 


IS(A = a) NS(B =P)| 


[S(4 =a) en 


Ps(B = B/A = a) = Ps(a=a)(B = B) ai 


We can use Bayes’ formula (and the formula of total probability) for these 
probabilities. It seems that we should obtain the above contradiction. How- 
ever, there is one delicate point. 
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In general we cannot assume that the conditional probabilities Ps(B = 
B/A = a) = Psya=a)(B = 8), a, 8 = 0,1, can be obtained from expansions 
(1.3), (1.4). We cannot identify the sub-ensembles S(A = a), S(B = $) of 
S with ensembles (A = a), 5(B = 8) which are described by the quantum 
states fa and ġg, respectively. 

There are different preparation procedures €,€(A = a), E(B = B), a, B= 
0,1. They produce ensembles S, S(A = a), S(B = 8), respectively, which are 
represented by quantum states ¢, a, Yg, respectively. We cannot identify the 
sub-ensembles S(A = a), S(B = (3) of S with ensembles (A = a), $(B = 
B). For instance, the preparation procedures €(A = a), a = 0,1, can be real- 
ized as filters F(A = a) such that only quantum systems s with the property 
A =a can pass F(A = a). However, such a filtration changes the value of 
the property B for s. Thus in general 


Ps(B = 8/A= a) = Ps{4=a)(B = 8) F Passe) (B = P). 


Moreover, Ps(B = G/A = a) may be not well defined. Despite of the fact 
that A, B € zg, it may be that the set {A = a} N {B = 8} is not described 
by any property C € Ttg. 

Such a phenomenon is not essentially nonclassical. An example of the 
selection of a sub-ensemble on the basis of one fixed property which can 
change the probability distribution of other property can be easily found 
for classical systems. We can illustrate this problem by Example 1.1. Let 
S be an ensemble of bodies having different colours, A = 0,1, and forms, 
B = 0,1. There are sub-ensembles S(A = a),a@ = 0,1, corresponding to 
fixed colours and S(B = 8), 8 = 0,1, corresponding to different forms. To 
extract elements of the S having the fixed colour a, we use a device D, which 
changes randomly the form of a body (some bodies of the form B = 0 are 
transformed in bodies of the form B = 1 and vice versa). By this procedure 
we obtain new ensembles $(A = a), a = 0,1. Of course, the distributions of 
B in S(A =a@),a = 0,1, may differ from the initial distributions of B in the 
ensembles S(A = a),a = 0,1. 

Conclusion. The contradiction between ‘quantum and classical proba- 
bilistic rules’ (1.5), (1.6) need not be regarded to the specific (‘nonclassical’) 
behaviours of statistical ensembles of quantum systems. The possible root of 
this contradiction is the formal use of Kolmogorov’s measure-theoretical ap- 
proach in that we do not control the relation between probabilities and statis- 
tical ensembles. The identification of probabilities corresponding to different 
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statistical ensembles implies (in general) the use of wrong values, |Uag|*, for 
conditional probabilities P(B = B/A =a) (which, in fact, must be calculated 
on the basis of (3.1)). This induces the illusion of the violation of Bayes’ 
formula (and the formula of total probability) in the quantum formalism. 

2. Ensemble approach: no conditional probabilities. It must 
be pointed out that any quantum state ¢ represents not a finite statistical 
ensemble consisting of N quantum systems, but an infinite ideal statistical 
ensemble S. For any property C, probabilities P(C = y) are, in fact, prob- 
abilities Ps(C = y) with respect to this infinite ensemble S. Of course, in 
each concrete run R = {1,2,...,N} of experiments we can obtain only a fi- 
nite statistical ensemble S( and ‘experimental ensemble probabilities’ are 
probabilities (relative frequencies) with respect to SCR), 


{ses C=} 
is 


PS?(C = 7) = Psw (C = 7) = 


For different runs R and R’, these probabilities are rather different. The 
main feature of quantum systems (as many other physical systems) is that 
these probabilities have the property of the statistical stabilization, namely, 
limo Psa» (C = y) = |d,|?, where d, are coefficients in the expansion of 
¢ with respect to the system of eigenvectors of the operator C. These limiting 
probabilities are probabilities with respect to the infinite ideal ensemble S : 


{se 5:C =} 


d,|? = P,(C =y = 


(3.2) 


However, as the field of real numbers R does not contain actual infinities, 
formula (3.2) has no meaning in the framework of real analysis. Instead of 
(3.2), mathematicians (and, as a consequence, physicists) use the measure- 
theoretical approach. However, (despite of the common opinion) this ap- 
proach cannot be used as a justification of the ensemble probability theory 
even in the case of a countable ensemble S = {51, 82, ..., Sk, ...}. For exam- 
ple, let us try to define the uniform o-additive probability on S : P({s,}) = 
P({so}) =... = P({se})... # 0. Then P(S) = 0°, P({sj}) = 00. 

Such ‘pathological’ properties of the field of real numbers (the absence 
of actual infinities) is one the reasons to use the ensemble-frequency inter- 
pretation instead of the purely ensemble interpretation. By the ensemble- 
frequency interpretation a Kolmogorov probability space is based not on the 
ensemble S of quantum systems, but (roughly speaking) on the ensemble Q 
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of all possible (ideal infinite) runs of experiments. However, this approach 
to the definition of a probability space was, in fact, never used by physi- 
cists. They typically assume that a Kolmogorov probability space gives the 
mathematical representation of an ensemble of quantum systems. One of the 
main reasons to do so and to reject the ensemble-frequency approach is the 
impossibility to construct a probability measure on the space of all runs for 
measurements corresponding incompatible properties. 

One of problems of the Kolmogorov axiomatics is that probability P 
must be closed (defined on the o-algebra or at least algebra of sets). Thus 
if probabilities of events {A = a} and {B = £} are well defined, then 
automatically probability of the event {A = a}M{B = 8} must be well 
defined. Hence the conditional probability P(B = G/A = a) must be well 
defined in Kolmogorov’s framework. However, such conditional probabilities 
are not observed. In principle, there are no reasons to assume that they are 
even well defined for each quantum state ¢. For example, why we cannot 
assume that the ensemble S = N and ‘probability’ P = 6, where 6 is the 
density of natural numbers? In such situations there is no Bayes’ formula at 
all and the problem of difference between ‘quantum and classical probability 
rules’ is meaningless. Of course, these are non-Boolean models. However, this 
non-Boolean structure of probabilities has no special nonclassical features. 

Detailed analysis of the problem of existence of conditional probabilities 
will be presented in section 4. 

3. Frequency probability viewpoint to quantum probabilistic 
rule. The frequency approach to probability gives more freedom than the 
ensemble approach. Here we need not assume that properties of quantum 
systems are objective'®. Thus in principle we can consider various combina- 
tions of objective and nonobjective properties of quantum systems. 

We start with the general scheme in that we do not suppose that any 
of properties A(= 0,1) and B(= 0,1) has an objective character. Let ¢ € 
H be a quantum state. As in section 1, we consider the two dimensional 
Hilbert space H. Properties A and B are represented by symmetric operators 
A and B; ea = (¢0,¢1) and eg = (Yo, Y1) are orthonormal bases in H 
consisting of eigenvectors of operators A and B, respectively. Thus we have: 
$ = copo + C1b1 = doo + diy, where co, ci, do, dı € C, and |co|* + |ei|? = 


10We recall (see section 2) that nonobjective character of some properties (creation of 
these properties in the process of a measurement) does not imply ‘essentially quantum 
features’ of systems. 
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1, |do|? + |d|? = 1. 

A quantum state ¢ represents an ideal infinite ensemble S of quantum 
systems. This ensemble is characterized in the following way: frequency 
probability distribution of any property C(= 0,1) is given by squares of 
absolute values of coefficients in the expansion of ¢ with respect to the system 
of eigenvalues of the operator Ĉ representing C. In particular, we have: 

(1) Any series of measurements M of the property A for elements s; € 
S, j = 1,2, ..., induces a collective 


an = (a1, Q,..., Am, +); @j = 0,1, 


such that frequency probabilities Pap (a) = limk—>oo %(a@;ay) are equal to 
lca|?,@ = 0,1. Here, as usual, %(a;ay) = nz (a;ay)/k is the relative fre- 
quency of realizations of the value A = a. 

(2) Any series of measurements M of the property B for elements s; € 
S,7 =1,2,..., induces a collective 


bm = (Bı, Be, -3 Bks e) B; = 0, 1, (3.3) 


such that frequency probabilities P,,,(8) = limg+oo%(8;bm) are equal to 
|da, œ = 0,1. Here, as usual, %(@;ba1) = ng (G; bx) /k is the relative fre- 
quency of realizations of the value B = 2. 

Remark 3.1. As we have already discussed, infinite statistical ensembles 
could not arise in any real physical experiment. We always operate with finite 
statistical ensembles (samples of finite lengths) Sy which are prepared by some 
preparation procedure € after N steps. However, a quantum state ¢ = gg (corre- 
sponding to this preparation procedure) cannot be considered as a representation 
of any of these finite ensembles Sy. A measurement for elements of Sy gives only 
a relative frequency, but not a probability. These frequencies may fluctuate when 
N is changed. Only asymptotically frequencies in Sy approach probabilities in 
S. Different properties may have different behaviour of fluctuations of frequencies 
before stabilization. 

In the same way quantum states ġa = UagoWo + Uar, œ = 0,1 describe 
some ideal infinite statistical ensembles $(A = a) of quantum systems. In 
particular, these ensembles have the following frequency properties: 


HI do not agree with the viewpoint of A. N. Kolmogorov: “The frequency concept based 
on the notion of limiting frequency as the number of trials increases to infinity does not 
contribute anything to substantiate the application of the results of probability theory to 
real practical problems where we always have to deal with a finite number of trials.” 
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g (La) Any series of measurements M of the property A for elements s € 
S(A =a),j =1,2,..., induces a collective 


Wan = (01,02, wy Oks POR 0; = 0, 1, 


such that frequency probabilities Pu y(@) = limpooV%(Q} Waw) = 1 and 
Py iy (1 — a) = limp vell — a; wan) = 0,a = 0,1. 

(2a) Any series of measurements M of the property B for elements s; € 
S(A=a),j = 1,2, ..., induces a collective 


bam = (Ai, ro; e.. Ak, PO Aj = 0, 1, (3.4) 


such that frequency probabilities Py, ,,(G) = limo ¥% (8; bam) are equal to 
luagl?, B = 0, 1. 

As we have already seen, by quantum calculus P;(3) = |dg|? = [couog + 
c1u1g|”, 8 = 0,1. As in the case of ensemble probabilities, if we forget about 
dependence of probabilities on collectives and identify in the formula of total 
probability conditional probabilities P(G/a) = P(B = 6/A = a),a,B = 
0,1, with probabilities P,,,,(G) (of the property B = £ in the collectives 
bam,a@ = 0,1), then we arrive to the contradiction (between classical and 
quantum probability calculi). However, in the frequency approach there are 
even less reasons for such identification of probabilities than in the ensemble 
approach. Moreover, there immediately arises the problem of the correct 
frequency definition of conditional probabilities P(8/a). 

These conditional probabilities can be defined on the basis of the opera- 
tion of combining of collectives, see section 9, Chapter 1. However, it is not 
clear which collectives 


a= (a, Q2, ose) On, oils A= Qj = 0, I, 


and 
b= (G1, G2, very Bry ds) B= pi =0, 1, 

we have to combine to obtain conditional probabilities P(G/a) (which must 
be equal to probabilities Pp, ,,(3) for observed collectives (3.4)). In any case 
the direct combination of observed collectives ay and bm would not produce 
such conditional probabilities, because these collectives are independent and 
here P(G/a) = Po,,(8) £ Poom (8). 

Moreover, we have shown (see section 9, Chapter 1), that (in the case 
of strictly positive probabilities) the condition of combining of ā and 6 is 
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equivalent to the existence of a two dimensional collective 


z = (21, 22; vs Sas e) zi = (aj, bi)» 


where a; and (3; are elements of @ and b, respectively. Hence the two dimen- 
sional probability distribution 


P,(a, 3) =P,(A=a,B= b) = Jim v(a, 8); z) 


must be well defined. Here %((a, 8); z) = ną((a, 8); z)/k, a, ß = 0,1, are 
relative frequencies of the realization of two dimensional labels y = (a, 8) 
in the collective z. We recall that the two dimensional probability distri- 
bution P,(a, 8) and conditional probabilities are connected as P,(a, 8) = 
P,(G/a)Pa(a). Thus if we assume that probabilities Pa(a) and conditional 
probabilities P,(G/a) are obtained on the basis of observed quantities, then 
we must assume that the two dimensional probability distribution P,(a, 8) 
can be also obtained on the basis of observed quantities. Hence we have to 
assume the possibility of the simultaneous measurement of the pair (A = 
a, B = 3). However, if the properties A and B are incompatible (represented 
by noncommuting operators A, B), then the existence of the simultaneous 
distribution for (A = a, B = 8) contradicts to the quantum formalism. 

Conclusion. In the frequency probability framework it seems to be im- 
possible to define conditional probabilities P,(B = B/A = a) on the basis 
of combining of collectives generated by some observations of incompatible 
properties A and B. Therefore the formula of total probability cannot be used 
for frequency physical probabilities. 

4. Kolmogorov formalism and quantum measurements. We 
consider now Kolmogorov’s (ensemble-frequency) interpretation of ‘quantum 
probabilities’. If we use the abstract measure-theoretical formalism, then we 
might identify some probabilities related to different probability spaces. This 
imply the contradiction between ‘classical’ and ‘quantum’ probabilities. In 
fact, different preparation procedures G are described by different probability 
spaces (Ng, Fg, Pg). 

The quantum state ¢ = c¢9+¢1¢1, see (1.1), which describes the ensem- 
ble S (consisting of a statistical mixture of quantum systems with prop- 
erty A = a = 0,1 with probabilities Py(A = a) = |cq|*) is prepared 
via a preparation procedure €. It is described by a Kolmogorov probability 
space P = (Qe, Fe, Pe). The states ¢,.,a = 0,1, which describe ensembles 
S(A = a) (consisting of quantum systems with the definite value a of the 
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property A) are prepared via other preparation procedures &, (filters with 
respect to A = a). These states must be described by other Kolmogorov 
probability spaces (Qen, Fe,, Pea). 

Typically physicists apply the formula of the total probability by mixing 
conditional probabilities Pg(G/a) with respect to the probability space P 
with probabilities Pg, (3),a = 0,1. Such a manipulation induces the contra- 
diction between ‘classical’ and ‘quantum’ probabilities. 

5. Interference. Here we do not present statistical models which might 
explain the interference phenomena on the basis of the corpuscular picture 
(see [53], [54], [71]). We want just to illustrate our analysis of the notion of 
probability in quantum formalism by the example of the two slit experiment. 
This is the simplest experiment for demonstrating interference of light. There 
is a point source of light O and two screens L and L’. The screen L contains 
two slits ho and hı. Light passes through S (through slits) and finally reaches 
the screen L’ where the interference rings are observed. The wave explanation 
of the existence of interference rings is well known: the light reaching L can 
travel by one of two routes—either through ho or through h,; but the distances 
travelled by lights waves following these two paths are not equal and the light 
waves do not generally arrive at the screen ‘in step’ with each other. 

On the other hand, O is a source of quantum particles, photons. To 
exclude the interaction between photons in a beam, we perform the experi- 
ment with very weak light, so that at any time there is only one photon in 
the region between O and L. The screen L’ is replaced by a photographic 
plate or film (also denoted by L’). Individual spots appear on L’ more or 
less chaotically. However, there appear standard interference fringes for a 
sufficiently long exposure. By this experiment we can compute the probabil- 
ity distribution of spots on L’. Here the property A = 0,1, is given by a slit 
which is passed by a photon. The property B is obtained by the discretiza- 
tion of a measurement of the position on the screen Z. Let D be a domain 
on the plane L’. We set B = 1 if a photon is observed in D and B = 0 if a 
photon is observed in L’ \ D. The formal application of the formula of total 
probability gives that P(B = 8) = P(A = 0)P(B = B/A=a)+P(A= 
1)P(B = 8/A = 1). However, experimental data demonstrates the violation 
of this equality. Mainly physicists interpret this violation as ‘nonclassical 
behaviour’ of photons. They claim that a photon does not pass one fixed 
slit. 

Our ensemble analysis of the quantum formalism implies that we have to 
consider three ensembles: (1) S consists of all particles that pass through the 
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screen L when both slits are open; (2) Sp consists of all particles that pass 
through the screen L when the slit ho is open and the slit h is closed; (3) 
S, consists of all particles that pass through the screen L when the slit hy is 
open and the slit ho is closed. These ensembles of particles are represented by 
quantum states ¢ and œ, ¢1, respectively. In fact, probabilities which physi- 
cists use in the formula of the total probability for the two slit experiment 
are related to different ensembles of particles: P(A = a) = Ps(A = a) and 
P(B = B/A =a) = P3,(B = B). Of course, the formula of total probability 
must hold true if instead of probabilities P, (B = 8) we would use proba- 
bilities Ps(B = 8/A = a). However, to find latter probabilities, we have to 
use sub-ensembles Sa, œ = 0,1, of the ensemble S consisting of particles that 
pass slits hy. These sub-ensembles could not be found without to disturb 
the property B (because to find a slit, we have to perform the additional 
measurement, which, of course, change the distribution of B). Therefore 
it is insensible to discuss experimental verification of the formula of total 
probability for the two slit experiment. 


In the following frequency analysis we shall use the framework of prepara- 
tion/measurement. We consider preparation procedures £, &, E correspond- 
ing to the following configurations of open slits: hg and hı are open, hg is 
open and h; is closed, hı is open and ho is closed. The measurement proce- 
dure M is a measurement of the position B on the screen L’. Our frequency 
analysis of the quantum formalism implies that we have to consider three 
different collectives bm, bom, bim which are obtained by the measurement 
M for particles prepared by E, &, E1, respectively. There are no reasons to 
identify probabilities P(B = @/A = a) with frequency probabilities in the 
collectives bam, œ = 0,1 (but typically they are identified). Moreover, prob- 
abilities P(B = 6/A = a) may be not well defined in the standard frequency 
framework. 

If we keep to i-realism, then our probability analysis of the two slit experiment 
implies a nonlocal dependence on an equipment which is used for a preparation 
procedure (if we do not want to accept the special ‘quantum probability rules’). 
The difference between probability distributions of B in the ensembles Sẹ and 
Sq implies that the local change of the experimental arrangement (open slit or 
closed slit) implies global consequences. For example, if we close the slit ho, then 
behaviour of the interaction between a photon and the slit hı is also changed. 
Thus, in fact, a photon interacts with the whole screen L. The same consequence 
we obtain in the frequency framework. 


However, if we keep to f-realism, then a problem of non-locality does not arise. 
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Here we do not assume that initial values A; of the property A are observed. Proba- 
bilities given by quantum states ġa (by ensembles S$.) are conditional probabilities 
with respect to final values Ay of A, P(B = 6/As = a). But in the formula of the 
total probability for the quantum state ¢ (the ensemble S) we have to use prob- 
abilities P(A; = a) and P(B = 6/A; = a). The values A; for the quantum state 
@ (the ensemble S) and the corresponding probabilities cannot be found in the 
experimental framework. Analysis of such probability distributions with respect 
to a ‘hidden property’ will be presented in section 4. 

6. Non-ergodic interpretation of quantum mechanics. We discuss 
now other delicate problem in the probabilistic foundations of quantum mechanics. 
As it has been pointed out, Kolmogorov’s (ensemble-frequency) interpretation of 
probability implies identification of the ensemble and frequency approaches to 
probability. As a consequence, it is always assumed that frequency probabilities in 
the collective bm, see (3.3), can be identified with probabilities with respect to the 
ensemble S of pairs (a,u), where a is a quantum particle and u is an equipment 
which is used to measure the property B of a.!2 However, this postulate has never 
been tested experimentally!?. Brasilian physicist N. Buonumano proposed a non- 
ergodic interpretation of quantum mechanics, [17]. By this interpretation frequency 
probabilities need not coincide with S-ensemble probabilities. This imply that 
in principle trials need not be independent (see [53], [54]). Thus there may be 
correlations between x; and z;,i Æ j, in x (of course, in this case x would not be 
Mises’ collective). 


4 Probabilities with respect to objective con- 
ditions 
The formalism of quantum mechanics implies that it is impossible to per- 


form experiments for a simultaneous measurement (A = a, B = 8) of two 
incompatible properties A and B of a quantum system". Therefore two di- 


12Despite of the fact that in all real experiments this collective is generated by the long 
chain of successive experiments with the same equipment ug,. 

13The group of H. Rauch at Atominstitute in Wien did some indirect experiments in this 
direction, see [96] (the most interesting experiment was performed by J. Summhammer, 
[101]). 

14Tn fact, on the physical level incompatible properties are defined as properties for 
which it is impossible to perform a simultaneous measurements. The representation of 
such properties by noncommuting operators in a Hilbert space of quantum states is only 
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mensional probabilities P(A = a, B = 8) (or equivalently conditional prob- 
abilities P(B = 8/A = a) for P(A = a) > 0) could not be found on the 
experimental basis. 

However, it is still sensible to discuss the existence of such probabilities 
if it is supposed that properties A, B are objective. Of course, the simulta- 
neous existence of two objective properties does not imply automatically the 
possibility to perform a simultaneous measurement of these properties. 

We shall study the most general situation. It is supposed that B is some 
observed property (it may be objective or created by the act of a measure- 
ment) and A is some objective property which cannot be observed simul- 
taneously with B. We shall study the problem of existence of conditional 
probabilities P(B = 8/A = a) in the frequency and ensemble frameworks. 
The present scheme covers different approaches to properties of quantum 
systems: 

(i) We may keep to z-realism. Here the observed value of B coincides 
with its initial value. Thus we study probabilities P(B; = G/A; = a) where 
A; and B; are initial values of properties. 

(f) We may keep to f-realism. Here the observed value By of the property 
B can differ from its initial value B;. Thus we study probabilities P(B; = 
B/A; =a). 

(e) We may keep to empiricism (or even idealism). Here the property B 
is created by the act of a measurement. It is meaningless to speak about 
values of B before a measurement. 

In fact, f-realism seems to be the most attractive. Here we can use 
the preparation/measurement approach. Some preparation procedure E pro- 
duces a statistical ensemble § of particles with the definite probability dis- 
tribution for the property A (which is typically supposed to be objective). 
Then a measurement M of other property B is performed for particles s € S. 
This measurement gives final values By of the property B. 

The main result of our considerations is that conditional probabilities 
P(B = 6/A = a) and two dimensional probabilities P(A = a, B = 8) may 
be not exist (both in frequency and ensemble approaches). This result seems 
to be rather strange from the Kolmogorov probability viewpoint. 

Our models in that P(B = 6/A = a) do not exist give examples of 
(non-Kolmogorovean) probabilistic models without conditional probabilities 
(probabilities are well defined, but conditional probabilities could not be 


a consequence of such impossibility of simultaneous measurements. 
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defined). Here probability is not closed. It is defined on the system of events 
which do not form a set algebra. 

1. Frequency probabilities. To define frequency conditional probabili- 
ties P(B = G/A = a), we must combine two collectives a and b corresponding 
to values of A and B. We choose a collective b = by, see (3.3), induced by 
a measurement M of the property B as one of collectives for combining. 
As the property A is objective, then each element s; € S has this property. 
Hence parallel to the constructing of the collective bm we can imagine the 
process of construction of a ‘hidden sequence’ 


an = (Q1,09,...,Qn,--.), a; = 0,1, 


where a; = a if the property A has the value a for a quantum system sj. 
Suppose that the sequence ap is a collective. We choose a = ap as another 
collective for combining. Suppose that probability distributions P,(@) and 
P,(q) are strictly positive. Finally we suppose that collectives a and b are 
combinable. Therefore the two dimensional sequence 


Z= (21, 22, sony Zis as Zj = (a;, bi), 


corresponding to these collectives is also a collective. Hence frequency con- 
ditional probabilities P,(8/a) = P.(B = 6/A = qa) are defined via the 
standard scheme: 

Suppose that there are M;,(a;z) elements with the first coordinate a among 
the first k elements of z, and there are n;,(8/a; z) elements with the first coordinate 
ß among these Mp(a; z) elements. We introduce the relative frequencies: 


ne (B/os z) 
Mla; z) ` 


vkla; z) = Miles 2) and %(G/a;z) = 


Conditional probability is defined as 
P,(G/a) = lim %(6/a; z). 
k> 


This definition can be reformulated in the following way. For each fixed a = 0,1, 
we choose a subsequence 


ba = (B1; Ba, 5 Bay) bi = 0,1, 


of the sequence z consisting of second coordinates of % = (a;,8;) with a; =a. 
Then P,(8/a) = limp_.o0 Ve (83 ba) - 
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The conditional probability P,(G/a) has the following meaning: it is 
probability to observe the value 8 of the property B under the condition 
that the hidden (but objectively existing) value of the property A is equal to 
a. The two dimensional probability distribution P,(a, 3) = P.(a)P.(8/a) 
is also well defined. This probability has the following physical meaning: it 
is probability that the hidden property A = a and the observed property 
B=8B. 

2. No conditional probabilities, no Bayes’ formula. In principle 
there may be some frequency ‘pathologies’. It may be that the hidden se- 
quence a = ap is not a collective or it is a collective, but the collectives a and 
b are not combinable (physical experience implies that the sequence b = bm 
is always a collective). Let us analyse such a situation more carefully. To 
simplify our considerations, we start with the case in that a = ap is a collec- 
tive. Here frequency probabilities Pa (a) = limp... ¥%(a; a) are well defined. 
However, we do not more assume that the collectives a and b are combinable. 

We have nk(8;b) = ng (B/0; z) + ng(8/1; z). Thus we obtain 


py _ Me(B/0; z) Me (0;z) | re(B/1; z) Ma (1; z) 
MBD- GA E MGA 


Thus we have 
ve( b; b) = ve(8/0; z)v (0; a) + (8/1; z)vr(1; a) . (4.1) 


We also note that by our assumptions there exist P,(8) = limk—oo vk( 8; b) 
and Pa(a) = limk—o 4% (a; a). We ask the following question: 

Is it possible that (despite of the existence of the above limits and despite 
of equality (4.1)) limpoo Vk(8/a;z) do not exist? 

Yes, it is surely possible! 

Example 4.1. Let P,(a) = limp... %(a;a) = 1/2,a = 0,1. As always 
%4,(0/a; 2) +r%(1/a; z) = 1 for a = 0,1, it is possible to represent conditional 
frequencies in the form 


V4 (0/as z) = sin? pak, Ve(1/0; z) = cos” bak 5 
where the phase a = arcsin /1%,(0/a;z). In the case of regular condi- 


tional behaviour angles ģa, stabilize (mod 27) to some values ¢,. when 
k — oo. Here conditional probabilities are well defined: P,(0/a) = sin? ġa 
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and P,(1/a) = cos? ġa. Equality (4.1) implies the formula of total probabil- 
ity: 


P,(0) = 5 (sin? po +sin?’ i), Ps(1) = 5 (cos? Po + cos” $1) . 


Let us consider now the case of irregular conditional behaviour. Here angles 
Pak do not stabilize (mod 27) when k — oo. But by (4.1) we have that limits 


1., . ; 1. 
P,(0) = 5 jim (sin? do,k + sin? ¢14), Ps(1) = 5 Jim (cos? Po,k + cos? 61,4) 


must exist. For example, these conditions can be satisfied if we choose @,,4 * 
5 — okk — oo. Thus there is no contradiction between nonexistence of 
frequency conditional probabilities and formula (4.1). 

What is a physical meaning of fluctuations of conditional relative frequen- 
cies 4(3/a) (nonexistence of conditional probabilities P,(G/a))? 

A quantum state ¢ contains only information on asymptotic behaviour 
of frequencies for observations of each fixed property. However, ¢ does not 
contain information on statistical relations between different properties. This 
relation is given by conditional frequencies which are not determined by the 
quantum formalism. Therefore in principle behaviour of relative frequencies 
in the statistical ensemble S (represented by ¢) may be extremely irregular. 
But these fluctuations of conditional frequencies may compensate one another 
and give well defined frequency probabilities for observed properties. 

A real preparation procedure € can produce (after N steps) only a finite 
approximation Sy of the (ideal infinite) ensemble S represented by ¢. Fluc- 
tuations of conditional frequencies imply that the statistical relation between 
two properties A and B (or more precisely the reaction of a quantum system 
s with the fixed (hidden) value a of the property A to a measurement of the 
property B) may strongly depend on the number N of experiments N. Let 

T 


us consider again Example 4.1 and let do, ©% TE bk x- Zk — œ, 


where m > 1 is the fixed natural number. Here ‘conditional probabilities’ 


. o Wk wk 
P* (0/0) = (0/0; z) © sin? om’ P*(1/0) = % (1/0; z) & cos? FA 


k 
1/1; z) = sin? — 
vk(1/1; z) & sin 5 


P*(0/1) = v (0/1; z) & cos? = P*(1/1) 


oscillate with the period T = 2m, when k — oo. Let m be very large. Then, 
for k = 2mj + 1,P*(0/0) = P*(1/1) = 0 and P*(1/0) = P*(0/1) = 1. 
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Therefore in the ensemble S, practically every quantum system s having the 
property A = 0 will exhibit the property B = 1 and practically every quan- 
tum system s having the property A = 1 will exhibit the property B = 0 (in 
the measurement M of B.) However, after (m — 1) steps statistical condi- 
tional behaviour changes crucially. For K = 2mj+m, P*(0/0) = P*(1/1) + 1 
and P*(1/0) = P*(0/1) ~ 0. Therefore in the ensemble Są practically ev- 
ery quantum system s having the property A = 0 will exhibit the property 
B = 0 and practically every quantum system s having the property A = 1 
will exhibit the property B = 1. At the same time observed ‘probabilities’ 
P*(B = B) = %(G;b) do not depend on these oscillations of conditional 
probabilities. 

Remark 4.1. (On fluctuations of ensemble conditional probabilities) 
The above arguments can be used for ensemble conditional probabilities. A quan- 
tum state @ represents an infinite ideal ensemble S of quantum systems. As we 
have already discussed in section 3, real analysis does not give a possibility to use 
the proportional definition of probability with respect to S. Typically probabilities 
Ps(B = £) with respect to S are considered as limits of probabilities Ps, (B = £) 
with respect to finite approximations Sy of S. Such an approach to probabili- 
ties Ps(B = £8) is justified by the incredible number of quantum experiments. 
However, it is often supposed that conditional probabilities Ps(B = 8/A = a) 
can be also defined as limits of probabilities Ps,(B = 8/A = a) with respect 
to finite approximations Sy of S. Such an assumption has not been (and prob- 
ably it will never be) verified experimentally. Example 4.1 (which can be used 
in the ensemble framework) demonstrates that in principle conditional probabil- 
ities Ps, (B = 8/A = a) may oscillate with the increasing of N. In such a case 
conditional probabilities Ps(B = @/A = a) cannot be defined. Therefore Bayes’ 
formula and the formula of total probability cannot be used for such quantum 
states. 

Finally we remark that it may be that the sequence a = ap is not collec- 
tive. For example, if we keep to f-realism, then the statistical stabilization 
of frequencies (As = a) need not imply the statistical stabilization of fre- 
quencies 1%(A; = a). 

Example 4.2. (Fluctuating probabilities and stabilized conditional prob- 
abilities). Suppose that 4,(0,a) œ% sin? @, and v%(1,a) ~ cos? dy, k — oœ. 
If phases ¢ do not stabilize (mod 27) when k — oo, then frequencies 
v4,(a; a) fluctuate when k — oo. Thus frequency probabilities P,(a@) do not 
exist. Suppose that, however, frequency conditional probabilities P,(G/a) = 
limp soo 4, (8/a; z) exist and they are equal to 1/2. Therefore sizes of popu- 
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lations SN,œ with the fixed value A = a fluctuate, but reactions of quantum 
systems s € Syq to the measurement M of the B are stable. In this case 
we find that the limit in (4.1) exists 


P,(8) = jim (v: (0; a)y,(3/0;z) + % (1; a)%, (8/1; z)) = 1/2, 8 = 0,1. 


Thus there is no contradiction between nonexistence of frequency probabili- 
ties P,(a) and formula (4.1). 


5 Ejinstein-Podolsky-Rosen paradox: proba- 
bility, reality and locality 


In the previous sections we have shown that there is no principal difference 
between ‘quantum’ and ‘classical’ probabilities and as consequence between 
classical and quantum systems. We can use ensemble or frequency definitions 
of probability. However, we have to control the relation between probabil- 
ities and ensembles or collectives. On the other hand, we cannot use the 
conventional Kolmogorov formalism in that the structure of an ensemble or 
a collective does not play any role. In principle, it is possible to consider 
‘quantum properties’ as objective properties (by using both i-realism and 
f-realism). Of course, probabilistic distributions of these properties depend 
on ensembles or collectives. 

However, there are some quantum experiments which seem to demon- 
strate that there is a large difference between classical and quantum sys- 
tems. All such experiments are based on the idea to eliminate disturbance 
effects by separating quantum systems in space-time (and to use correlations 
between these separated quantum systems). The starting point was the fa- 
mous Einstein-Podolsky-Rosen (EPR) experiment [36]. We present a brief 
description of this experiment. We start with the definition of einsteinian 
separability: 

Two space-time regions U and V are said to be spatially separated, if the 
real factual situation within V is independent of what is done in U. 

We also remark that A. Einstein was (more or less) an adherent of i- 
realism. Thus values of physical properties which will be discussed later 
(namely, positions q and momentums p) are initial values of this properties. 

It should be noticed that the study of distinguishing features of ‘quantum 
probabilities’ was not the original aim of EPR’s considerations. EPR wanted 
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to show that quantum mechanics is not a complete physical theory. The com- 
pleteness means that quantum mechanics provides a complete description of 
the atomic and subatomic phenomena. The opinion that quantum mechanics 
is complete (and hence we need not more detailed description of reality than 
quantum mechanics) was, already at that time (1933), so much engrained 
in the mind of physicists that the EPR arguments against the completeness 
was soon referred as a paradox. EPR wanted to demonstrate that there exist 
elements of reality which could not be described by a quantum state. Of 
course, in this framework the question on the meaning of an element of 
reality arose immediately. EPR thought that it would be impossible to pro- 
pose the exact definition of an element of reality. However, they proposed 
the following criterion of reality: 

“Tf, without in any way disturbing a system we can predict with certainty 
(i.e., with probability equal to unity) the value of a physical quantity, then 
there exists an element of physical reality corresponding to this physical quan- 
tity.” 

EPR proposed the following arguments based on this criterion for ele- 
ments of reality and the notion of separability for two space-time regions U 
and V. 

Let us consider a statistical ensemble S of pairs (a!, a?) of correlated par- 
ticles. For example, these are pairs of particles which are emitted by excited 
atoms. We consider the one dimensional model with particles moving in the 
opposite directions. For each pair the correlation implies the conservation 
of the momentum, p™ + p°? = 0, and the relative position, q® — q® = 0 
(correlations between properties of particles). For any pair (a', a”) of corre- 
lated particles, we can measure the position q of a! in U which (due to the 
correlation) gives the position q'?) of a? (without to disturb a”). Thus the 
position q? of a? is an element of reality. By the similar considerations we 
obtain that the momentum p°? of a? is an element of reality. On the other 
hand, the quantum formalism implies the Heisenberg uncertainty relation 


AqAp > h/2 


for any quantum state ¢. Thus the definite values of the position and mo- 
mentum of a quantum particle cannot be simultaneously elements of reality 
for the same quantum state. EPR interpreted this as the evidence of the 
incompleteness of quantum mechanics: in the EPR experiment two elements 
of reality (the position and momentum) exist simultaneously, but they could 
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not be described by any quantum state (thus the formalism of quantum me- 
chanics does not provide the description of the whole physical reality). 

The EPR considerations (which are often regarded as the paradox in the 
foundations of quantum mechanics) induced great debates (which were initi- 
ated by A. Einstein and N. Bohr). Numerous arguments were used by both 
sides. It is interesting to remark that at the first stage of these debates prob- 
ability reasons were not used. There was no analysis of the probability basis 
of the EPR considerations. In particular, nobody tried to study the problem 
of difference between ‘classical and quantum probabilities’ to disprove the 
simultaneous reality of the position and momentum. However, later such 
analysis has been done and one of the results of this analysis was famous 
Bell’s inequality (see section 6). 

I support the viewpoint that quantum mechanics is not complete. The 
incompleteness of quantum mechanics is rather a consequence of all physi- 
cal experience which demonstrated that no physical theory (at least before 
quantum mechanics) turned out to be universally valid. Every single theory 
was valid only if applied to a restricted part of reality, its domain of applica- 
tion. Do we have any reason to believe that quantum mechanics is different, 
and will hold true for whatever future experiments we may be able to think 
of? But at the same time I think that EPR arguments do not imply the 
conclusion that quantum mechanics is not complete. I am not satisfied by 
EPR’s criterion of reality. There are strong probabilistic arguments against 
this criterion. The meaning of ‘unit probability’ in this criterion is unclear. 
In fact, this ‘unit probability’ must depend on an ensemble or a collective. 
We shall see that it is impossible to find the same ensemble or collective for 
the positions and momentums of particles a? (or particles at). 

To save completeness of quantum mechanics, some physicists accept non- 
locality of space-time. They claim that, for example, a measurement of the 
position of the particle at located in Moscow changes properties of the par- 
ticle a? located in Vladivostok. Some of them assume the possibility of the 
faster-than-light-influences (of course, such an assumption contradicts to the- 
ory of relativity). Other adherents of nonlocality consider this nonlocality as 
only information nonlocality. They think that, for example, a measurement 
of the position q” of the particle a! does not change objective properties of 
the particle a2, but such a measurement changes our information about the 
particle a?. Hence they need not use the faster-than-light-influences. 

- Another group of physicists thinks that the root of the problem is the 
realist viewpoint of EPR. If we reject realism and keep to empiricism (or 
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even idealism), then we could not assign any physical meaning to values of 
the position q? and momentum pl”? of the particle a? which are predicted 
on the basis of measurements for the particle a. Adherents of empiricism 
have also some differences in views. One part of them think that the root 
of the problem is the impossibility to perform a simultaneous measurement 
of the position q and momentum p“ for the same particle a! (and thus 
obtain the ‘simultaneous prediction’). Other keep to the rigid empiricists 
line. They think that the root of the problem is the impossibility to perform 
a simultaneous measurement of the position q® and momentum p®) for the 
same particle a?. 

As I have already mentioned, it seems that EPR arguments do not imply 
incompleteness, nonlocality or impossibility to keep to realism. EPR con- 
siderations imply only that we could not manipulate with formal (abstract) 
probabilities which are not related to concrete ensembles or collectives. 

By fixing a value a € R of q we can construct an ensemble Sa of 
particles aj for that q” = a and the distribution of the momentums is the 
same as in the original ensemble. Of course, probability Ps,(q@ = a) = 
1. Thus q? = a@ is an element of reality for the ensemble 5,. However, 
P3 (p® = 6) # 1 for any fixed value  € R. Thus p® = £ is not an 
element of reality for this ensemble. 

In the same way by fixing the value 8 € R of p™ we can construct an 
ensemble Rg of particles a? for that p®?) = B and the distribution of the 
positions is the same as in the original ensemble. Probability Pz, (p® = 
8) = 1. Thus p = 8 is an element of reality for the ensemble Rg. However, 
Pr (q® = a) # 1 for any fixed value a € R. Thus q? = a is not an 
element of reality for this ensemble. EPR did not present any idea how we 
could construct an ensemble Wag of particles a? such that Py,, (q? =a)=1 
and Pw,,(p® = @) = 1. Therefore the EPR arguments give no reason to 
conclude that quantum mechanics is not complete. There are no reasons to 
use nonlocality or to reject realism to explain the EPR arguments. It must 
be pointed out that EPR arguments could not be used as a ‘proof’ that t- 
realism can (or even must ) be used to describe quantum phenomena’®. In 
fact, from the mathematical point of view the only ‘argument’ of EPR was 


15EPR obtained incompleteness of quantum mechanics by presenting the experiment 
which demonstrates that both the position and momentum of a quantum particle can be 
elements of reality for the same quantum state. This is often interpreted as a proof of the 
possibility to use i-realists approach in quantum mechanics. 
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that the notion of the unit probability can be used without connection to 
concrete ensembles. 

Our previous considerations can be repeated in the frequency frame- 
work. Here we can keep not only to i-realism, but also to f-realism in 
the EPR scheme!®. Let us perform a measurement M of q) and a mea- 
surement M of p) and save only the results for that q® = a, where 
a € R, is some fixed value. We obtain the two dimensional collective: 

= fq) pa (1) _ (2) _ a — 
Lo = (£1, L2,..-,Lk,---), Zj = (q; P} ) where q; =a. As q} =q; =a, 
we can (parallel to the construction of rq) construct another two dimen- 
sional collective ry = (£1, T2, Lh ++), Lj = (qf py); where q? = =a. In 
the same way, fok. each fixed value 8 € R of the momentum, we construct 


two dimensional collectives Ya = (y1, Y2; +5 Yeo), Yj = (a, p®), where 


pP = B, and y, = (Y Yh oY), y = (aP, pP), where p? = 8. Of 


Sadie: Paz, (q? = a) = 1 and q®) = a is the element of reality for the col- 
lective x, and Py, (p® = 8) = 1 and p® = £ is the element of reality for the 
collective yg. However, EPR did not present any idea how we can construct 
a collective zag such that P,,,(q?) = a) = 1 and P,,,(p = 8) = 1. 


6 Bell’s inequality for probabilities 


The EPR idea to consider statistical ensembles of correlated spatially sepa- 
rated particles was developed by D. Bohm. He proposed a simpler example 
in that it is possible to use discrete variables. 

1. EPR experiment for spin. Instead of the position and momentum 
of a quantum particle, he considered its spin!” components. Let s € RË be 
the spin of a quantum particle. For any axis n € R? we denote the projection 
of s to this axis by the symbol s, (i.e., Sn = ear), where (-,-) and ||- || are, 
respectively, the inner product and norm on R$). There exists an equipment 
M,, for measuring the spin projection s,. However, such a measurement dis- 
turbs a quantum particle and changes the spin. There are no measurement 
devices Mn, which can measure two components {s,, Sw}, n Æ n’, simulta- 
neously. However, for correlated particles (a', a”) (with spins s',s?) we can 


16Tn the latter case qÙ = q® and p® = p ,£=1,2, and q-a? = =0, pi” +p?) = =0. 

17 The scientists whose interests are far ee quantum para may imagine spin as 
an arrow s € R which is associated with each quantum particle indicating the ‘internal 
rotation’ of the particle. 
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use the conservation law for the spins of these particles: s! + s? = 0. Hence 
the measurement M} for a! gives automatically the value s? of the spin of 
a”. As usual, it is assumed that particles a! and a? satisfy the condition of 
einsteinian separability. By the EPR reality criterion we obtain that there 
exists an element of reality corresponding to the spin component $ (and by 
symmetry for the st) for any axis n € R*. Therefore the spin s is an element 
of reality. 

By probabilistic reasons (discussed in the previous section) we do not ap- 
ply the EPR reality criterion. However, we can study the following problem: 

Is it possible to use realism for describing spin measurements 
for correlated particles? 

We restrict our consideration to the plain model. Here each direction n 
can be characterized by an angle ¢ : n = ng. We set sy = sign(s, ng). In the 
real PRYSER] model we have to use probabilities of simultaneous measure- 
ments of 84, and s$, for three angles ¢;, 7 = 1,2,3. 18 It is possible to obtain 
some inequality for gees probabilities, namely, Bell’s inequality. As there are 
two particles a! and a”, to describe the (plain) model we must use the four 
dimensional Hilbert space. However, we can obtain the same results on the 
basis of the two dimensional Hilbert space by using the following toy-model. 

Let H be the two dimensional Hilbert space and let eg = (e¢,4,€¢,-),@ E 
[0, 27), be orthonormal bases in # which are connected by the following 
unitary transformation: 


4,4 = cos(O — djeg,+ + isin(0 — d)eg_ , (6.1) 


€g,— = isin(6 — p)eg4 + cos(O — Pjeg_ . (6.2) 


We introduce the quantum state 


We consider observables (properties) s corresponding to bases ep : geg = 
+e. By the probability interpretation of the quantum state Ų we have 
Pu(s¢ = +1) = 1/2. 

If (as it is usually assumed by physicists) conditional probabilities P(w € 
Q : salw) = €/sy(w) = ô), €,d = £1, are identified with probabilities given by 


18Tn fact, in experiments we need to use even four angles. 
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expansions (6.1), (6.2), then we obtain the following ‘quantum probabilities’ 
P(w E 2: sew) = +1, 84(w) = —1) 


= P(w E€ Q: sylw) = —1)P(w € 2: sp(w) = +1/sg(w) =—1) (6.3) 


=i sin?(@ — ¢) 
and 
P,(w E 2: sew) = +1, 84(w) = +1) 


= PWw € Q: sg(w) = 4+1)P(w E€ 2: sew) =41/sg(w)=+1) (6.4) 


= 4 cos?(0 — ¢). 
It is interesting to note that there is no contradiction between ‘quantum and 
classical probability rules’ in this model. The reader can easily check the 
validity of Bayes’ formula. 

2. Bell’s inequality for probabilities. We prove now some inequality 
for events defined by three variables s,(w), y = 0, ¢,0. In fact, this inequal- 
ity does not depend on the form of the probability distributions of random 
variables s.(w). We shall use only the fact that there exists the Kolmogorov 
probability space P = (Q, F, P) on which these random variables are defined: 


P(w € Q: sow) = 41, 55(w) = +1) 
= Plw € 2: sp(w) =41,84() =+1,8~)=41) (6.5) 
Picola EET sie) =) 
POCO OU ta =a) 
Poetsen as 6H) 
A E EAN E T S 

a P(w € 2: atoi 
= P(w € Q : so(w) = +1, slw) = +1, 89(w) = +1) (6.7) 


+P(w E Q : sow) = +1, slw) = —1,s9(w) = +1). 
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If we add together the equations (6.5) and (6.6) we obtain 
P(w E Q: sow) = +1, 8g(w) = +1) + P(w E 1: slw) = —1,8e(w) = +1) 
= P(w € Q : so(w) = +1,sy(w) = +1, se(w) = +1) 
+P(w E Q : so(w) = +1, sylw) = +1, s9(w) = —1) 
+P(w E 2: so(w) = +1,84(w) = —1, s9(w) = +1) 


+P(w E Q : (w) = —1,84(w) = —1,s9(w) = +1). 
(6.8) 
But the first and the third terms on the right hand side of this equation are 
just those which when added together make up the term P(w E 0: g(w) = 
+1, sọ(w) = +1) (Kolmogorov probability is additive). It therefore follows 
that: 


P(w E Q: so(w) = +1, slw) = +1) + P(w E 2: sy(w) = —1,89(w) = +1) 
= P(w €2: so(w) = +1, 89(w) = +1) 
+P(w € 2: so(w) = +1,8g(w) = +1, 89(w) = —1) 


+P(w € 2: so(w) = —1,84(w) = —1, s9(w) = +1) 
(6.9) 
By using nonnegativity of probability we obtain the inequality: 


P(w E Q : sow) = +1, ssw) = +1) + P(w € 2: slw) = —1,s9(w) = +1) 


> P(w EQ: so(w) = +1, s9(w) = +1) 

(6.10) 
which is a variant of Bell’s inequality (for probabilities). 

We turn back to physics and apply the inequality (6.10) to the ‘quantum 
probabilities’ P,, see (6.3), (6.4), which were computed in the framework 
of quantum mechanics. We obtain: cos? ¢ + sin?(@ — ¢) > cos? 9. Now set 
p = 30. We obtain: g(@) = cos? 36 + sin? 20 — cos? 0 > 0. However, the 
latter inequality holds only for sufficiently large angles 8 : 0 > 1/6. Thus for 
0 < 1/6 the inequality (6.10) is violated. 
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7 Bell’s mystification 


First of all we remark that the violation of Bell’s inequality for ‘quantum 
probabilities’ may be in principle interpreted as an evidence of violations of 
quantum mechanical laws for the spin model. However, numerous experi- 
ments were performed in the connection with this problem, see, for example, 
[5], [20]-[22], [47]. All these experiments demonstrated that quantum me- 
chanical laws hold true: experimental probabilities coincide (of course, with 
some precision) with quantum probabilities P,(sp = €,sg = 6),¢,6 = +1, 
computed via (6.3), (6.4). Bell’s inequality for experimental probabilities is 
violated. 

1. Probability and reality. It is widely accepted by a part of physical 
community that the violation of Bell’s inequality has demonstrated that the 
realists philosophy cannot be used for the description of quantum phenomena: 
the spin is not an objective property of a quantum particle. 

Remark 7.1. Other part of the physical community connects Bell’s inequal- 
ity and nonlocality of space-time: in the real physical experiments observables 
Sọ = s} and sg = 84 correspond to two particles which are separated in space. 
However, our probabilistic analysis will demonstrate that there are no traces of 
nonlocality in Bell’s framework. Therefore we will mainly concentrate our consid- 
erations on connection between Bell’s inequality and realism. 

The problem of existence (reality) of spin is often mixed with the problem 
of existence of random variables sg(w), [0,27], defined on some Kolmogorov 
probability space. However, these are two different problems. Kolmogorov’s 
model is just one of possible models of reality. Besides Kolmogorov’s model, 
there exist frequency and ensemble models. We shall demonstrate that Bell’s 
inequality does not present in the latter models. Thus there is no problem 
with experimental violations of this inequality. The spin can be in principle 
considered as an objective property of a quantum system. We shall show 
that we can even keep to ‘real realism’, namely, i-realism. 

2. Realism and Bell’s inequality. Let sg,¢ € [0,27], be initial val- 
ues. We use the ensemble approach to probability. The main distinguishing 
feature of this approach is that all probabilities in (6.5)-(6.7) depend on 
corresponding ensembles. There are three ensembles D, Soo, SẸ, (of cardi- 
nality N) which are used to obtain observed probabilities Psy, (so = +1, 3g = 


+1), Psx (sọ = +1, 99 = +1), Psy (so = £1, 89 = +1).9 


19Tn Kolmogorov’s model ensemble indexes are omitted. In fact, this manipulation which 
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Remark 7.2. Formally we could introduce an infinite ensemble S of particles 
and define ensemble probabilities with respect to S: 


8 E S : So, = El, Saa = €2; +++; San = €E 
Ps (So, = Cl; Saz = E2; +++) San = En) = it = = iSi Byer "en oT 


(7.1) 
where e; = +1,n € N. Of course, the calculations of section 6 can be repeated for 
such probabilities. However, this can be done only formally. As we have already 
mentioned, the proportional definition of probability is meaningless for infinite 
ensembles in the framework of real analysis. Thus we could not perform these 
formal calculations on the mathematical level of rigorousness. 

Remark 7.3. The proportional ensemble definition (7.1) can be used on the 
mathematical level of rigorousness on the basis of non-Archimedean analysis. For 
example, in Chapter 4 we study p-adic ensemble probabilities. All arithmetical 
calculations (6.5)- (6.9) can be performed in the field of p-adic numbers. But (6.9) 
does not imply (6.10)! Some of probabilities Ps(sq, = €1, Saz = €2, Sag = €3) 
can be negative! In fact, there is some hidden (and still unclear) logic in such an 
appearance of negative probabilities in models in that the formal use of infinite 
statistical ensembles is not justified (see Chapter 3). 

Therefore we have to operate with finite ensembles Da. The three dimen- 
sional probabilities used in (6.5)-(6.7) must be also considered as probabili- 
ties with respect to these ensembles. Thus in (6.5)— (6.7) we use probabilities 
Psy, (...), -Psy (...). Hence (6.5)+(6.6) gives 


Psy, (So = +1,sg = +1) + Poy, (sg = —1, sọ = +1) 
= = Poy, (So = +1,54 = +1, s = +1) + Psw (So = +1,sg = +1, Sọ = —1) 
+P sy (So = +1,s = —1,8 = +1) + Psy (So = —1,sg = —1, Sọ = +1). 
But in the opposite to calculations with abstract Kolmogorov probabilities in 
section 6 the first and the third terms on the right hand side of this equation 
are not those which when added together make up the term 
Poy (So = = +1, sọ = +1) = 


= Pow (So = +1,s = +1,89 = +1) + Poy (So =+1,s, = —1, sọ = +1). 


looks quite innocent is the origin of Bell’s mystification. 


Quantum probabilities 93 


To obtain (6.9), we have to identify Pow (So = +1,sy = +1,se = +1) and 
Pow (So = +1,s = +1,s, = +1), Psy (So = +1,s = —1,sp = +1) and 
Psy (So = +1,8g = —1,s = +1). But (and this is the crucial point!) there 
are no reasons to do this in the general case. 

For example, in quantum experiments with correlated particles it is pos- 
sible to measure only two dimensional probabilities Psy n (sar =E Elsa 
+1) (by using correlations between particles). The physical experience is that 
this ensemble probabilities stabilize when N — oo. However, there are no rea- 
sons that three dimensional probabilities Psy ,, (So = +1, 8g, = +1, 89, = 
+1) must also stabilize when N — oo. They could depend essentially on sta- 
tistical ensembles. Therefore the identification of probabilities with respect 
to different ensembles is not justified at all. 

Practically the same considerations can be repeated in the framework 
of von Mises’ frequency theory. There we have to consider three different 
collectives, zog, £40, zog, instead of ensembles S Oss Say SN. These are collec- 
tives for two dimensional labels (so, sq), ($4, So), e so). The principle of sta- 
tistical stabilization can be applied only to these lables. The frequencies 
VN (Sa = £1, 8g = £1; Lag) stabilize when N — oo. However, the frequencies 
VN (Sq = £1, 8g = +1, sy = 1; zag) need not stabilize when N — oo. More- 
over, they may be not defined at all. Therefore there is no Bell’s inequality 
in von Mises probability model. 

If we keep to f-realism, we have to use von Mises’ frequency probabil- 
ity theory. Therefore we could not obtain Bell’s inequality. There are no 
problems with violations of this inequality. 

Remark 7.4. Typically Bell’s inequality is associated with the use of so called 
hidden variables, see section 9. As it has been noticed in [30], [33], it can be derived 
without any reference to hidden variables. As the reader has seen, it was only sup- 
posed that there exists a Kolmogorov probability space P = {Q9, F, P) such that 
three spin projections So, S¢,Sọ can be represented by random variables on this 
space. Under this assumption it is possible to define the joint probability distribu- 
tion P;;, = P(so = 1,84 = j,89 = k),i, j,k = +1. On the other hand, the existence 
of the joint probability distribution P;;, implies the existence of the Kolmogorov 
space with P = {P;;,}. This connection between Bell’s inequality and existence 
of joint probability distribution was discussed by A. Fine [38], P. Rastall [95], W. 
de Muynck and H. Martens [32] (see also [33]). Typically nonexistence of the joint 
probability distribution is interpreted as the impossibility to use the objective re- 
alism (at least its i-version). From our viewpoint this is just the impossibility to 


94 Chapter 2 


apply the Kolmogorov model of probability theory (i.e., to use abstract symbolic 
probabilities without to regard to concrete ensembles or collectives). In the fre- 
quency approach such nonexistence only demonstrates the absence of the statistical 
stabilization for relative frequencies vy (so = i,8g = j,89 = k) for three different 
projections of spin. However, there are no (!) experimental reasons to suppose 
such a stabilization. In the ensemble approach such nonexistence only demon- 
strates the absence of the reproducibility of the ‘property’ (so = i,sg = j, 89 = k) 
in statistical ensembles used for quantum experiments. However, there are no(!) 
reasons to suppose such a reproducibility. 


8 Bell’s inequality for covariations 


We have considered Bell’s inequality for probabilities. The original Bell’s 
inequality [10], [11] was proved for covariations. 

Theorem 8.1. Let P = (Q,F,P) be a Kolmogorov probability space and 
A,B,C € RV(P) be discrete random variables, A,B,C = +1. Then Bell’s 
inequality 

|< A,B>-—<C,B>|<1-<A,C> (8.1) 


holds true. 
Proof. Set A =< A,B > — < C, B > . By linearity of Lebesgue integral 
we obtain 


he f Alw) Blw w)- Cw) Blw w) = f IA) [A(w |B(w)dP(w). 


(8.2) 

As A(w)? = 
Al = 1 [fh - ACAB AP) (8.3) 
< f [1 — A(w)Cw)}dP(w) = 1- < A,C >. (8.4) 


Of course, this is the rigorous mathematical proof of (8.1) for Kolmogorov 
probabilities. However, as we have mentioned, Kolmogorov’s model does not 
provide the adequate description of some quantum measurements. The root 
of ‘Bell-Kolmogorov mystification’ is again the identification of probabilities 
corresponding to different statistical ensembles or collectives. 
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Let us consider von Mises’ approach. The ensemble approach will be 
considered in section 9 in connection with so called hidden variables. In the 
frequency formalism the covariations < A,B > and < C, B > are covaria- 
tions with respect to two different collectives, x43 and tog : < A,B >= 
<A,B>;z,, and < C, B >=< C, B >zrop . Thus 


N N 
1 1 
< A,B >rtaB 7 N > aibi , <C,B torn NW 3 cib, 
and 
1 N 
< A,B > z4B— < C, B > tcp = 5, ) [a;bi — cb]. 
i=1 


There are no reasons to suppose that 


rS eee 
i=1 i=1 


Hence Bell made the mistake on the first step, (8.2), of the proof by using the 
linearity of mean value with respect to two different collectives (or statistical 
ensembles). 

As physicists (with a few exceptions) did not see probabilistic roots of 
Bell’s misunderstanding, they try to find some explanations of experimental 
violations of Bell’s inequality: 

1. Death of reality. It is impossible to keep to realism and suppose 
that quantum systems have objective properties. 

2. Nonlocality. As in quantum experiments, covariations are found via 
measurements for correlated particles which are separated in space, it can be 
supposed that ‘nonclassical’ behaviour of these covariations is a consequence 
of the dependence of the state of one particle on the state of other particle. 

Of course, these ideas could not be denied on the basis of our probability 
analysis. But our analysis has demonstrated that Bell’s arguments have no 
relation to these ideas. 


9 Hidden variables and Bell’s inequality 


1. Incompleteness of quantum mechanics. Theories based on so called 
hidden variables were developed starting with the hypothesis on incomplete- 
ness of quantum mechanics. Typically incompleteness of quantum mechanics 
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is considered as a consequence of the EPR arguments. By these arguments 
both position and momentum (or projections of spin to different axes) of each 
quantum particle are elements of reality for the system of two correlated par- 
ticles. However, the quantum formalism does not describe the simultaneous 
existence of these elements of reality. Thus quantum mechanics is not com- 
plete. However, we have demonstrated that the EPR arguments are based 
on the formal use of Kolmogorov abstract probabilities. Of course, such 
‘arguments’ could not be considered as the proof of incompleteness of quan- 
tum mechanics. Nevertheless, incompleteness of quantum mechanics can be 
directly obtained as a consequence of keeping to realist philosophy. 

2. Hidden variables. Let us suppose that quantum mechanics is not 
complete. There could be finer description of reality than given by quantum 
mechanics. In principle there could exist some additional variables A, hidden 
variables, such that by specifying the value A = ^% of A we could determine 
the values of all physical observables: Ao — A(Ao), Ao —> B(Ao),.... . Com- 
patibility or incompatibility of physical observables A, B,..., do not play any 
role. 

Typically Bell’s inequality is considered in the framework of hidden vari- 
ables. The Kolmogorov probability space P = (Q, F,P) which was used 
in section 8 has the following interpretation: Q = A is the set of hidden 
variables, w = A, P is the probability distribution of hidden variables. The 
experimental violations of Bell’s inequality are interpreted as the evidence 
that such hidden variables do not exists. Other authors use nonlocality ar- 
guments. They think that, despite of nonexistence of local hidden variables, 
nonlocal hidden variables can exist. 

However, all our probability arguments against Bell’s inequality can be 
repeated for hidden variables. Let us use von Mises’ frequency approach. As 
we have already seen in section 8, Bell’s mistake is the assumption on the 
validity of equality (8.5). 

3. Deterministic hidden variables model and Bell’s mystifica- 
tion. To simplify our considerations, we suppose that the set of hidden 
variables is finite: 


ene pee re 


For each physical observable U, the value A of hidden variables determines 
the value 
U =U()). 


Here we keep to realism. It is possible to keep i-realism or f-realism. If 
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we keep to i-realism in this model, we have to assume that the result of 
measurement does not depend on fluctuations of an internal state w of a 
measurement device My (see the next section for the model with such a 
dependence). 

Let U and V be physical observables, U,V = +1. We start with the 
consideration of the frequency (experimental) covariation < U,V >z,,, with 
respect to a collective zyy induced by measurements of the pair (U, V). The 
zuv is obtained by measurements for an ensemble Syy of physical systems 
(for example, pairs of correlated quantum particles). Our aim is to represent 
experimental covariation < U,V >zy,, as ensemble covariation < U,V >syy - 
Then we shall demonstrate that in the general case it is impossible to perform 
for ensemble covariations Bell’s calculations, (8.2) — (8.4), which have been 
performed for Kolmogorov covariations. Let Suy = {d1,...,dw}, where ith 
measurement is performed for the system d;. Define a function 7 — A(z), the 
value of hidden variables for d;. We set ng(Suv) = |{di € Suv : A(t) = Ak} 
and př“ = Psy (A = Ax) = 82). We have 


M 
< UV dev = P UAOW OAO) = Dre Suv) unre 


i=1 k=1 


M 
UV 
= S "Pk UkUk =< U,V > Suv) 
k=1 


where up = U (Ax), ve = V (Ax). Thus 
A =< Å, B >24, — < C, B >sop 


M 
=< A, B >s, — < C, B >8or= X (pe? an — Peck) be 
k=1 


and 


M 
< A,C >ra =< A, C >s10= Y PE Caney. 
k=1 


We suppose now that probabilities of , do not depend on statistical ensem- 
bles: 


Pe = pe? = pf” = pe? (9.1) 
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(later we shall modify this condition to obtain statistical coincidence of prob- 
abilities, instead of the precise coincidence). Hence 


M M 
A= S pela = Ck)bk and < A, C >2Aaco— S Prakcr. 
k=1 k=1 


We can now apply Theorem 8.1 for the discrete probability distribution 
{px }{“, and obtain Bell’s inequality. 

However, if condition (9.1) does not hold true, then equality (8.2) and, as 
a consequence, Bell’s inequality can be violated. The violation of condition 
(9.1) is the exhibition of unstable (with respect to the real metric) statistical 
structure on the level of hidden variables of (at least some) quantum ensem- 
bles. In particular, the principle of the statistical stabilization (‘the law of 
the large numbers’) can be violated for hidden variables: limy—oo vy (A) do 
not exist. Thus we could not introduce the probability distribution on the 
set of hidden labels A. 7° 

Nevertheless, we obtained the following mathematical result: 

Theorem 9.1. Let statistical ensembles satisfy condition (9.1). Then 
Bell’s inequality (8.1) holds true. 

We introduce now a statistical analogue of the precise coincidence of 
ensemble probabilities for hidden variables. Let £1, E2 be two ensembles of 
physical systems and let a be a property of elements of these ensembles. The 
mT has values (a, ...,@m). We define 


bx (Er, E2) -5 |Pe, (ai) — Pe, (ai)|, 


where Pe(a;) = Meee aut are ensemble probabilities. We remark that 
the function 6, is a pseudometric on the set of all ensembles which elements 
have the property m : (1) 6,(&,&) > 0; (2) 6,(&1,&) = 6,(€,&1); (3) 
bn (E1, E2) < Ôr (E1, E3) + ôn (E3, E2). The distance 4,(E1, E2) = 0 iff ensembles 
€, and Ez have the same probability distribution of property m : Pe, (ai) = 
Pe (ai), i = 1, 2, e M 


20 All our considerations were based on the statistical stabilization with respect to the 
real metric. In Chapter 4 we shall consider the statistical stabilization with respect to a 
p-adic metric. It may be that some ensembles of hidden variables which are unstable with 
respect to the real metric are stable with respect to the p-adic metric, see [58]—[60]. 
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In our model we set m = À, hidden variables. The precise repeatability of 
the probability distribution of hidden variables (9.1) can be written as 


6(Sas, SoB) = (SaB, Sac) = 0, 


where 6 = 6). Of course, we need not use such a precise coincidence in 
probabilistic considerations. 
Theorem 9.2. Let statistical ensembles satisfy condition 


6(Sap, Sop), (Sas, Sac) < €. 
Then Bell’s inequality 
|< A,B >s4g — < C, B >sop | < (1+ 2€)— < A,C >s,. (9.2) 


holds true. 
Proof. We have 


M M 
JA] < |X pý” (ar — cx)bel +| XC (pÉ? — pe? onda 
k=1 k=1 


M M 
< e+) ph” lagde|(1—ance) < (1+6)— < A, C >s10 +) pkl — pe? |laxce| 
k=1 k=1 


<(1+26)-<A,C >Sac: 


E 
We use the index N to denote the cardinality of a statistical ensemble. If 
probabilities Pow (Ax) stabilize when N — oo, 


jim, Psy, (Ar) = P(x), 


then (SaB, Scs),6(Sap,Sac) > 0,N — oo. This imply precise Bell’s in- 
equality (8.1). On one hand, experimental violations of the latter inequality 
can demonstrate that probabilities of hidden variables with respect to the ideal 
infinite ensemble do not exist at all (they fluctuate when N — oo). On the 
other hand, these violations can be a consequence of the fact that we do 
not know the value of a constant € in (9.2). It might be that, despite of 
the stabilization of probabilities for N — oo, this constant is quite large for 
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statistical ensembles which are used in quantum physics. In fact, ‘right Bell’s 
inequality’ (9.2) could not be experimentally verified. 

4. Stochastic hidden variables model and Bell’s mystification. 
Here we keep to f-realism. Thus, for each physical observable U, its value 
Uş = U(A) is the final value of U after a measurement. Such a result of 
a measurement depends not only on the value À of hidden variables, but 
also on the state of an equipment My which is used for measuring of U. 
A measurement device My is a complex macroscopic system which state 
depends on the huge number of fluctuating parameters. Denote the ensemble 
of all possible states of My by the symbol Dy : Ly = {w,...,wf,}. The 
final value Uy of an observable U depends on both A and w : 


U; = U(w, À). 


We call such a model stochastic hidden variables model. Our definition of 
a stochastic hidden variables model differs from the standard one, see sub- 
section 5. The standard definition is strongly connected with Kolmogorov’s 
model. 

Let U and V be physical observables, U,V = +1. We start again with 
the consideration of the frequency covariation < U,V >zy, with respect to 
a collective zyy induced by the measurement of the pair (U,V). The ayy is 
obtained by measurements for an ensemble Syy of physical systems. Our aim 
is again to represent the experimental covariation < U,V >z,, as ensemble 
covariation < U,V >g,, . Then we shall demonstrate that in the general 
case it is impossible to perform for ensemble covariations Bell’s calculations, 
(8.2) — (8.4). 

Let Suv = {d1,...,.dnv}, where ith measurement is performed for the 
system d;. Define functions i — A(t) (the same function as above) and 
i > wt (i)i — wY (i), states of apparatus My and My, respectively, at 
the instances, t and t”, of measurements of U and V for ith system. We 
have 


Vi. . 
<UV Sauv= FW ed i), A())V(w" (å), A). 
Set DY, = {i : A(z) = rx, oY (i) = wY} and DY, = Ae AG) = Ax, w" (i) = 
WY}, 1 <k < M,1 < s< Ly,l <q < Ly. Set RY = |D N Dil. It is 


evident that 
M Ly Ly 


22 = E 


k=1 s=1 q=1 
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Hence i 
UV 
< U,V >ryy= N S a Uki 
ksq 

where Uks = U (WY , Ak), Ukg = Voy, Ax). We show that < U,V >,,,, can be 
represented as ensemble covariation for an appropriative ensemble of physical 
systems and states of measurement devices. 

First we note that < U,V >z,,4< U,V >axs4xe,g (compare with sub- 
section 5). For the latter covariation, we have 


M Ly Ly 


1 
< U,V >axdaxde= Tp DODDA 
AHB k=l s=1 q=1 
l l 
and in general Paxsyxnp(A = àp u = WY wY = wy) = TINE + 1 even 


approximately for M, N, La, Lg > oo. 

It is also evident that < U,V >2,,4< U,V >syy . The latter covariation 
is simply not well defined, because the ‘properties’ wY (i) = wY ,wY (i) = wy 
are not objective properties of elements of the ensemble Syy. These ‘proper- 
ties’ are determined by fluctuations of parameters in the apparatus My and 
My. 

To find the right ensemble, we have to introduce two new ensembles, 
namely, ensembles of states of the apparatus My and My (in the process 
of measurements for the ensemble of physical systems Syv) : 


faU U U _ nV v v 
Smy = {a;n aN h a7 € Ly, Sumy ={A7,--,An},,a; € Lv, 


V 


where a¥ = wt (i), aY = wY (i) are states of My and My at the instances 


of ith measurements. We set 


Suv = diag(Suv x Smy x Smy) = {D;, ..., Dy}, Dj = (dj, a;7, ay). 


Then 7(D;) = (A(Qj),w%(7),w”(7)) is an objective property of elements of 
the ensemble Syy and 


<U,V >apy=< U,V >syy= >> U(w" (i), MDV (wY (2), A(i)). 


i=1 


We set 
Phoa = Puy (D; : m(D;) m (An, wy wy )) 
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_ {D3 € Suv : (Dj) = (Ag, wy) ws) 
[Suv| 
Hence we obtained that 


UV 
< U,V >ryy=< U,V >syy= >> pkey UksVkq: 
ksq 


Thus in the general case we have 


A =< A, B >z; — < C, B >2g,=< A,B >s,, — < C, B >sop 


pad AB CB 
= ag Dksq AskDkq = kes Pksq Cksbkq 


and 
AC 
< A, C >r40=< A, C >510= 5 PAS aksCkg: 
ksq 


We suppose now that probabilities Pho do not depend on ensembles: 
A c AC 
Pksq = Pia = Phoa = Psg: (9.3) 


In particular, we suppose that all measurement devices have the same set of 
states (of parameters): 


2 = a = pg = e (and L = L; = Lg = Lo). (9.4) 


Then we obtain 
A= X Prsq( Oks = Cks)Dkg- 


ksq 


However, we could not repeat trick (8.3) of the proof of Bell’s inequality. 
The equality aĝ, = 1 does not give the possibility to proceed the proof. Of 
course, we have 


|A| = | a Pksq(Gks — Gf5Cks)bkq| < skeg Pksq|aksbka|(1 — aksCks) 
<1- ksq Pksq@ksCks- 
But in general $; sg Pksq@ksCks İS not larger than < A, C >zs4c= DD sg Pksq@ksCkq- 


Therefore, if we keep to f-realism, even stability condition (9.3) (for com- 
bined ensembles of physical systems and states of measurement apparatus) 
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does not imply Bell’s inequality. A new source of violation of Bell’s inequal- 
ity is the inconsistency of random fluctuations for two measurement devices 
My and My. In general wY (i) 4 wY (i). 

Suppose that it could be possible to control states of My and My and 
choose w for My and My in the consistence way: 


w= (i) =u (i). 


Then the ensemble Syy would contain only triples of the form (Ax, ws, Ws) 
and 
Pha z Psyv (Arw, wy) = 0, S Æ q. (9.5) 


In such a case we obtain covariations: 
T 
< U,V >idea= > U (wY (i), A(i))V (wY (è), A(4)) = 2 Ph teste 


where p¥Y = pý. If we also suppose the validity of (9.3), we obtain 


|Ajtdeal | = | y Pks (aks = Crs) Dks| 
ks 


< 1— X Pks@ksCks =1- < A, C > Ideal - 
ks 


However, ideal covariations have no direct connection to experimental fre- 
quency covariations. 

Nevertheless, we can formulate the following mathematical theorem: 

Theorem 9.3. Let statistical ensembles (physical systems/measurement 
apparatus) satisfy conditions (9.3) and (9.5). Then Bell’s inequality (8.1) 
holds true for covariations with respect to these ensembles. 

Therefore, to obtain Bell’s inequality in the framework of f-realism, we 
have to suppose: (1) statistical repeatability of ensemble distribution of hid- 
den variables À in ensembles which are used for measurements; (3) statistical 
repeatability of fluctuations of states w in ensembles of an equipment; (3) 
consistency of fluctuations of all measurement devices. 

If the reader even deny the possibility of violations of (1) or (2), he must 
agree that condition (3) seems to be nonphysical: we could never control 
fluctuations of the huge number of parameters in the equipment. 
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Instead of precise coincidence (9.3), it is possible to consider (under the 
assumption (9.4)) the statistical coincidence based on the quantity: 


L 


6(Saz, Secs) = DA Pil: 


k=1 s=1 q=1 
Here 6 = 6, for the property (i) = (Ali), at (i),wY (i)). We remark that 
condition (9.3) of the precise coincidence can be written as 


6(Saz,Scp) = 0 


for every two pairs of observable (A, B) and (C, B). We also introduce a new 
quantity which is a statistical measure of inconsistency of ensembles Smy 
and § My `: 


a(Suv) = x. Poy (wY = Ws, W =) we 
8#q k sq 


Condition (9.5) of the precise consistency for states of My and My can be 
written in the form: 
o (Suv) = 0. 


Theorem 9.4. Let statistical ensembles (physical systems/measurement 
apparatus) satisfy conditions: 


6(Sap, Scop), 6(Sap, Sac) < € and o(Sag),o(Scx),a(Sac) < e. 
Then inequality 
| < A,B>s,, —-<C,B>s,, | < (1+ 26+ 3€)— < A,C >g,, 


holds true. 
Proof. We have 


|A| < E+ Ido pia (aks — Cks) Okq| 


ksq 


<e+ 2e + 5 pi |(axs im Crs) bs | <e+ 2e' a >S pee (1 aa ksCks) 
ks ks 


Le+4e + Xo pe Psg ( (1 z AksCkq) Š < (1 +2e+ Ac’) — -5 PK aksCkq- 
ksq ksq 
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= 

5. Right choice of probability distributions for stochastic hidden 

variables models. Typically stochastic hidden variables models are defined as 
models with probabilities (e = +1) 


P(U =6) = i P(U = e/A)d p(d), (9.6) 


where p(X) is the probability distribution of hidden variables and P(U = €/A) is 
the conditional probability to measure the value U = e for the quantum system 
having the hidden state A. Then the joint probability distribution can be defined 
(at least mathematically) as 


Pianeta f P(U; = €1/)P(U = €2/)P(Us SAd 


(9.7) 
In fact, to derive Bell’s inequality in the Kolmogorov framework, it is sufficient 
to use the existence (on the mathematical level) of the joint probability distribu- 
tion (9.7). However, considerations in the framework of the ensemble probability 
theory demonstrated that ‘probabilities’ (9.6) has no physical meaning. These are 
probabilities with respect to the ensemble A x Ny. However, physical probabili- 
ties are probabilities with respect to the ensemble Sy = diag(Su x Smy), where 
Su = {dj,...,dw} is the ensemble of quantum system used in the measurement. 

6. Individual and ensemble nonreproducibilities. Our hypothesis 
on the nonreproducibility of the probability distribution of hidden variables in 
statistical ensembles used in quantum experiments is related to De Baere’s [25], 
[26] hypothesis on the individual nonreproducibility. He mentioned that there are 
reasons (see also [33]) that it would be impossible to prepare the quantum system 
with the same value A for measurements of different observables. Thus probabilities 
P(U; = €;/X),j = 1,2,3, could not be defined for the same À. The latter implies 
that joint probability distribution (9.7) does not exist and Bell’s inequality could 
not be derived. We remark that deterministic hidden variables models satisfy the 
condition of the individual reproducibility. However, the ensemble reproducibility 
can be violated. 

7. Contextualistic realism. All our considerations in the f-realist frame- 
work can be repeated in the framework of so called contextualistic realism (N. 
Bohr and W. Heisenberg, see, for example, [33]). By this interpretation of quan- 
tum mechanics it does not seem allowed to assume the possibility of attributing 
the result of a quantum measurement to the object as property already possessed 
before the measurement. Here quantum observables are attributed as properties to 
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quantum mechanical objects. Thus a property of the quantum system is defined 
only in the context of a measurement. There is no difference in the mathematical 
description between f-realism and contextualistic realism. In both frameworks an 
observable can be described as a function U = U (w, A), where w and X are hidden 
states of a measurement device and quantum system, respectively. 


8. Other probabilistic models which do not contradict to lo- 
cal realism. L. Accardi [1] used non-Kolmogorovean model without Bayes’ 
formula to eliminate Bell’s inequality from considerations related to spin’s 
model. Recently he also developed a new non-Kolmogorovean model which 
gives an explanation of violations of Bell’s inequality, see [2]. 


I. Pitowsky [92], [93] discussed the possibility that some nonmeasurable 
sets can be physical events, i.e, some physical observables may be nonmeasur- 
able. There is no Bell’s inequality in this approach. Thus there is no problem 
with violations of Bell’s inequality. This model is consistent with known po- 
larization phenomena and the existence of macroscopic magnetism. He also 
proposed a thought experiment which indicates a deviation from the pre- 
dictions of quantum mechanics. We noted that already A. N. Kolmogorov 
discussed ‘generalized probabilities’ on the algebra FQ of all subsets of Q. 
Pitowsky discussed the relation of ‘Banach-Tarski paradox’ (Theorem 5.1, 
Chapter 1) to foundations of probability theory. It seems that Kolmogorov 
suspected that ‘nonmeasurable events’ could play some role in probability 
theory. The model of Pitowsky gives the interesting application of such ‘gen- 
eralized probabilities.’ I. Pitowsky noticed: 


“This so called ‘Banach-Tarski paradox’ is not a paradox at all. The pieces 
into which the ball is cut are nonmeasurable sets, that is, one cannot assign them 
numbers that indicate their volume since this will clearly violate the additivity or 
invariance of ‘volume’. In spite of this explanation and in spite of independent 
proofs that nonmeasurable sets exist, the Banach-Tarski result was taken as an 
unfortunate consequence of the axiom of choice (which is nevertheless, essential 
in some fields of ‘good’ mathematics). Suppose, however, that we reverse this 
attitude and maintain that the subsets into which the ball is decomposed exist in 
physical reality. These hidden pieces could be detected in two ‘states’. The first 
is a ‘one-ball state’ and the second a ‘two-ball state’. In each state the pieces do 
have a ‘volume’ which depends, however, on their mutual configuration. Assume 
that we have a source that emits five balls in the first state. On the way from the 
source to a counter two of the balls spontaneously transform to the second state. 
The counter, which does not distinguish between the states, will detect seven ball. 
This rather simplistic example serves to indicate that one can ‘perform miracles’ if 
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one is willing to accept the physical reality of some highly abstract set-theoretical 
objects. In particular, if such assumptions are made, it is possible to account for 
interference effects in a completely mechanistic way without introducing wavelike 
nonlocal components to the theory. 

Mathematicians, in particular applied mathematicians, where reluctant to take 
nonmeasurable sets seriously. As a result there exists no mathematical theory that 
relates nonmeasurable distributions with relative frequencies.” 

Such an extension of probability theory was created by I. Pitowsky and 
then strongly mathematically improved by S.P. Gudder [42]. He introduced 
the concept of a probability manifold M. The global properties of M inherited 
from its local structure were then considered. It was shown that a determin- 
istic spin model due to Pitowski falls within this general framework. Finally, 
Gudder constructed a phase-space model for nonrelativistic quantum me- 
chanics. These two models give the same global description as conventional 
quantum mechanics. However, they also give a local descriptions which is 
not possible in conventional quantum mechanics. 

Remark 9.1. Non-Kolmogorovean probabilistic models of Accardi, Pitowski 
and Gudder have higher level of abstraction than the original Kolmogorov model. 
This is one of explanations why these models are not so popular in quantum 
physics. On the other hand, we showed that Bell’s inequality does not contradict 
to local realism on the basis of the primary (rather primitive from mathematical 
viewpoint) probabilistic models, namely, the ensemble ans frequency models. It 
seems that our models have more close relation to physical reality. 

We shall discuss in Chapter 3 the use of negative probabilities and in 
Chapter 4 the use of p-adic probabilities to eliminate Bell’s inequality from 
considerations. 

Conclusion. ‘Bell’s inequality’ does not imply nonexistence of local hid- 
den variables. Physical reality may be nonlocal. Physical reality may be 
nonobjective. However, both these features of physical reality are not related 
in any way to Bell’s inequality. 
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Negative probabilities 


In this chapter we study possibilities to extend the probability theory to 
describe numerous physical models with negative probabilities. Of course, 
negative probabilities could not appear in Kolmogorov’s probability theory in 
that the probabilities of events must be positive real numbers. Therefore 
we have to turn back to the original probability formalisms, namely, ensemble 
and frequency. 


1 The origin of negative probabilities in the 
ensemble and frequency theories 


1. Ensemble approach: fluctuations of finite approximations. In the 
ensemble framework negative probabilities could not appear for finite statis- 
tical ensembles Sy = {51, 82,...,3n}. However, such generalized probabilities 
can naturally appear for infinite statistical ensembles S as the results of the 
limit procedure: 


S(A=a)NSy\ 


where a sequence of finite ensembles {Sy} gives an approximation of the 
infinite ensemble S. If this limit does not exist in R, then some regularization 
procedures (for example, the summation of divergent series or integrals) can 
induce negative values for Ps(A = a). Of course, in such a situation it 
would be natural to leave the domain of real analysis and consider some non- 
Archimedean number systems which contain actual infinities. In this case 
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the probability Ps(A = a) can be defined directly as the proportion: 


_ |S(A=a)N Sy 


Ps(A = a) (S| 


(1.2) 


In Chapter 4 we shall use the system of p-adic numbers Q, for such a pur- 
pose (another natural possibility is to use nonstandard numbers, [3]). In Q, 
proportion (1.2) can be a negative rational number (as well as a rational 
number which is larger than 1). 

2. Ensemble approach: split of conventional probabilities. Nonex- 
istence of limits (1.1) is not the unique source of negative probabilities for 
infinite ensembles S. It may be situations (see, for example, the p-adic frame- 
work) such that limit (1.1) (for some a) exists and equal to zero (from the 
viewpoint of the real analysis). For example, for the uniform distribution on 
S = N, we have Ps(A = n) = limyo } = 0 for all n = 1,2,... . However, 
some regularization of this limit procedure can produce nonzero coefficients 
Pë (a). In the mentioned p-adic framework such coefficients (defined by (1.1) 
with respect to the p-adic topology or directly by (1.2) with the aid of actual 
infinities) are always negative (rational) numbers !. Thus regularizations of 
(1.1) can induce the split of zero conventional probabilities in a set of new 
labels which can be negative numbers. These new labels can be interpreted 
as infinitely small probabilities. Such a split of conventional probabilities is 
not a feature of only zero probabilities. For example, probability one can be 
also split in a set of new labels which are interpreted as probabilities which 
differ from probability one by infinitely small probabilities. These are ‘prac- 
tically one probabilities’. In all p-adic examples such new probabilities are 
given by rational numbers which are larger than one’. Similar splits can be 
obtained for other rational probabilities q € (0,1). If0 <q < 1,q E€ Q, then 
we have two sets of labels L<, and L q. They denote, respectively, probabili- 
ties a = q — À and a = q+ À, where J is infinitely small probability. In p-adic 
examples we have Leg C QN (—00, 0) and LL, C QN (1, +00) (see Chapter 
4 for the details). 

On one hand, probabilities q < 0 (and q > 1) demonstrate irregular 
behaviour (N — oo) of approximations of probabilities Ps with respect to an 


Tn fact, we could not prove such a general theorem in the framework of p-adic analysis. 
But numerous examples demonstrate this feature of the p-adic split of zero conventional 
probabilities. 

This is natural: if P(A) = q < 0 is infinitely small probability, then P(A) = 1—P(A) = 
1 — q > 1 is probability which negligibly differs from 1 and vice versa. 
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infinite ensemble S by probabilities Ps, with respect to finite sub-ensembles 
Sy. On the other hand, they can describe the fine internal structure of S (via 
split of conventional probabilities). We note that from the physical point of 
view the irregularity of approximations means that it is impossible to prepare 
for all measurements for a quantum state ¢ (describing S) finite ensembles 
Sy with identical statistical properties. 

3. Frequency approach: irregularity of behaviour of frequencies. 
In the frequency framework negative probabilities could not appear in the 
classical theory of R. von Mises which is based on the principle of the sta- 
tistical stabilization of frequencies with respect to the real metric. However, 
if we assume that for some ‘quasi-random sequences’ £ = (21, 2, ...; Ens.) 
this principle can be violated, namely, the limit 


P,(a) = wim Vy (a; x) (1.3) 


does not exist in R, then some regularization procedures Re for (1.3) can 
produce negative values (as well as values which are larger than 1) for P}. 
One of the possibilities for such a regularization is to change the topology on 
the set of rational numbers Q in that we study the convergence of relative 
frequencies. In Chapter 4 we shall use the p-adic topology for such a purpose. 

4, Frequency approach: split of Mises’ probabilities. Another 
source of frequency probabilities q < 0 and q > 1 is the split of Mises’ 
probabilities. For example, the fact that frequency probability PMis*(A) = 
limn—oo n(A; x) = 0 does not imply that the event A should never occur. 
Therefore it is reasonable to take such events into account by using new 
labels. 

Let us consider two events A and B which have zero frequency probabil- 
ities: 


PA) = Jim v_(A; x) = 0, Po (2) = Jim vp(B;2)=0, (1.4) 
in R. We are interested in the problem: What event, A or B, has larger 
probability? Of course, this question is meaningless from the viewpoint of the 
Mises’ probability theory. However, this problem can be solved by extending 
the set of labels for probabilities. 

In the frequency framework we can obtain new sets of labels automatically 
by using new topologies for the statistical stabilization (by finding limits (1.3) 


112 Chapter 3 


with respect to new topologies)’. Each topology of the statistical stabilization 
induces its own set of labels for split Mises’ probabilities. For example, it 
may be that, despite of (1.4) in R, we have 


P;(A) = lim (4; £) #0, P3(B) = lim p(B; 2) #0 (1.5) 


for some topology 7 on Q. If we choose the p-adic topology 7 = 7, then 
in examples studied by the author p-adic probabilities (1.5) are represented 
by negative rational numbers. Thus by using negative probabilities we can 
split zero (Mises’) probability. The same split can be obtained for all Mises’ 
probabilities q € [0,1] N Q. 

On one hand, probabilities q < 0 and q > 1 demonstrate the violation 
of the principle of the statistical stabilization (the law of large numbers) 
for some ‘quazi-random’ sequences. On the other hand, they describe (with 
the aid of new topologies on Q) the fine internal structure of some Mises’ 
collectives. 

5. Where are negative probabilities? However, the reader may ask: 
Why could we not find negative probabilities in physical experiments? One of 
reasons is that, in fact, we have never tried to find them. All our experimental 
methodology is based on the principle of the statistical stabilization (the law of 
large numbers). All experiments are prepared in such circumstances that relative 
frequencies must stabilize. This is the result of our cognitive evolution. In the 
process of evolution the brain extracted from the chaotic and (lawless) reality 
phenomena which satisfy the principle of the statistical stabilization (repeatability 
in the average). These and only these phenomena are considered by the brain 
as real physical phenomena. Negative probabilities give the possibility to extend 
the range of physical phenomena by considering phenomena which violate the 
principle of the statistical stabilization. Another reason of the absence of negative 
probabilities in the experimental framework is the common use of real analysis for 
the study of the experimental statistical data. However, this data is always rational 
and in principle other topologies on Q (different from the real one) can be used for 
studying of this data. In particular, we have to pay more attention to events A with 
zero conventional (Kolmogorov or Mises) probabilities, P°°"Y(A) = 0. From our 
viewpoint such events are not less physical than events with positive probabilities. 
By using negative probabilities we can consider in analytical calculations events A 
such that P©°"(A) = 0. In this way we can clarify the hidden internal structure 


3The real topology is only one of many topologies on the set of rational numbers Q 
which contains frequencies vy = n/N. 
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of some events B with positive conventional probabilities. We shall study this 
question carefully in the next section. 


6. The formula of total probability as an average procedure. We 
consider a quantum measurement for quantum systems prepared in a state @. 
We suppose that each quantum system s which is taken for this measurement 
has a hidden state which determines (with some probability) a result of the 
measurement for the s (see Chapter 2). The set of hidden states is denoted 
by A. The number of hidden states may be infinite. 


Remark 1.1. Of course, in a laboratory we can produce only a finite ensem- 
ble Sy = {s1,...,8n} of quantum systems which have a finite number of hidden 
states \1,..., An, n < N. However, different finite ensembles Sy, Siy,.-. are used in 
different experiments. It is natural to assume that these finite ensembles are sub- 
ensembles of one infinite ensemble S. The quantum state @ describes this infinite 
ensemble. The infinite cardinality of S induces the impression that S is just an 
ideal mathematical abstraction. However, suppose, for example, that each elec- 
tron s has the extremely complex internal structure. Then, in fact, each s must 
be described by its own (individual) internal state A. In this case the number of 
all possible states (for all electrons in the universe) is really infinite. 


The (hidden) probability for A in S is denoted by the symbol py. On the 
basis of our previous considerations (see Chapter 2) it is natural to suppose 
that some of p, may be nonconventional probabilities; in particular, they 
may be negative’. 

In the process of a measurement each state is transformed into a new 
state ’ (due to an interaction between the quantum system and the equip- 
ment). Denote probabilities of this transition by pax. Some of these probabil- 
ities can be negative (in particular, the law of large numbers can be violated 
for some transitions À — à’). In the measurement we observe events A con- 
sisting of some sets of states X (in principle these sets can be infinite). By 


4In particular, they may be infinitely small probabilities. For example, if each electron 
in the universe has its own state A, then pa = lim + = 0 (from the viewpoint of real 
analysis). Negativity of p) can also be a consequence of the violation of the law of large 
numbers. Such a violation for hidden states À is quite natural if |A| = oo. For the concrete 
A, behaviour of frequencies vy (à; x) can strongly depend on a sample z. There are no 
reasons to assume that two different samples of quantum systems Sy = {s1, .. SN } and 
Sm = {81,..-, 3a} must produce samples x = (A1,...,Anw) and Ž = (A1, ..., Am) having the 
same probability distribution (because our macro equipment could not control statistical 
behaviour of hidden parameters). 
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the formula of total probability we obtain: 


P(A) = Yop 5 Pax - (1.6) 


ACA NEA 


In fact, this is the average procedure with respect to the ensemble A of hidden 
states À, transitions \ — A’ and states X’ which are identified in the observed 
event A. The result of this procedure can be a conventional (Kolmogorov 
or Mises) probability, despite of the possibility that some of probabilities 
Py, Pax < 0 or py, Pry > 1. 

The ensemble and frequency explanations of this phenomenon have been 
already presented in section 4, Chapter 2. For example, in the frequency 
framework fluctuations of frequencies yy(A) and (or) vy(A’/A) can compen- 
sate each other and produce the statistical stabilization. Examples 4.1 and 
4.2 showed that such a behaviour can be demonstrated even in the case of 
a finite set A. Thus one of the sources of conventional probabilities in (1.6) 
is that simultaneous (chaotic) fluctuations can produce in average the sta- 
tistical stabilization. Another source are infinite statistical ensembles with 
infinitely small initial probabilities p, < 0 and (or) transition probabilities 
Pay < 0. Infinite sums of infinitely small (negative) probabilities might pro- 
duce conventional positive probabilities. 

In all previous considerations the formula of total probability must be 
regularized via some procedure (for example, by using a new number system 
to find the limits of fluctuating frequencies, see Chapter 4). In general we 
could not even suppose the validity of the Bayes’ formula (even for one fixed 
state À and transition À — 2’). 

Example 1.1. Example 4.1 (Chapter 2) can be generalized by consider- 
ing the infinite set of hidden states A € [0,7]. We choose the uniform prob- 
ability distribution on [0,7] as the initial probability distribution p) (these 
are infinitely small probabilities). However, in the framework of real analysis 
we could not represent p) as proportional probabilities (1.2). The only thing 
which we can do is to use normalized Lebesgue measure on [0,7] to repre- 
sent p,. Let us consider an observable B = 0,1 (X = B) with conditional 
frequencies 


vk(0/A) œ% sin? kà, vk(1/A) % cos? kà, k > co, A € [0,7]. 


If \ # nl,l = 0,1,2,..., then conditional frequency probabilities P*(B = 
0/X) = limg_,..sin?kA and PË(B = 1/X) = limg....cos?kXA do not exist. 
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But the average procedure based on the (integral) formula of total probability 
gives well defined conventional probabilities for values of B : 


P(B =0) = lim f sin?k\ dp, = = P(B=1)= lim | cos’?k\ dp, = = 
k-00 Jo 2 k-00 Jo 2 
In section 3 we shall study examples in that nonexistence of conventional 
conditional probabilities implies negativity of generalized conditional proba- 
bilities. 
Example 1.2. The previous example can be easily modified to obtain 
a model in that probabilities P*(A) = p) do not exist. Let d%(A) =% 
2 sin? kà dd, k — ov, and let %(B = 8/A) ~ 4,k — oo. Then the fre 
quency probability distribution p) do not exist. But via the formula of total 
probability we obtain in the average: 


T 
P(B =0)= lim = of dv, (A 5 P(B =1)= lim dyy,(A) = r 
k>œ 2 k>% 2 0 2 

7. Negative probabilities and the principle of complementarity. 
The considerations of the previous section on the formula of total proba- 
bility as an average procedure are based on ideas of P. Dirac [34] and R. 
Feynman [37]. In particular, R. Feynman considered a roulette which has 
two internal (non-observed) states >, and A, and three observed states 1,2,3. 
By simple numerical examples (that the reader can produce by himself) he 
demonstrated that observed events can have positive conventional probabil- 
ities p; > 0,7 = 1,2,3, despite of negativity of some hidden probabilities 
Pai Pa, or conditional probabilities p),;,P).3,j = 1,2,3. However, neither 
Dirac nor Feynman could propose a mathematical explanation of the origin 
of negative probabilities (they considered negative probabilities as just for- 
mal quantities which could be useful in some calculations). I have found the 
frequency and ensemble roots of negative probabilities. For example, we can 
build Feynman’s roulette by using ‘quasi-random’ generators for states À 
and A, or for transitions Ay —> j and àz — j which simulate the statistical 
models of Examples 4.1 and 4.2 (Chapter 2), respectively. 

On the basis of our interpretation of negative probabilities it would be in- 
teresting to discuss the idea of R. Feynman on a connection between negative 
probabilities and the principle of complementarity in quantum mechanics, see 
[37]. As I could understood, R. Feynman is an adherent of z-realism (at least 
in this paper). 
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In the framework of subsection 6 we consider two physical properties A 
and B. Thus (despite of possible fluctuations of frequencies and conditional 
frequencies for hidden variables) frequencies 


Uy(Aa) = X vw(d) XC vn(à'/À) , where Aa = {A = a}, (1.7) 


AEA NEAa 
= So vn(à) X` uw (X'/A) ,where Bg = {B = 8}, (1.8) 
ACA NEBg 


stabilize (when N — oo) to conventional probabilities P°°"’(A,), PO" (Bg). 
However, in general there are no reasons to suppose that the frequency 


w(Aa Bs) =X nA) XO n/A) (1.9) 


ACA A'EAaNBg 


also stabilize (when N — oo). If (1.9) does not stabilize, then conventional 
probability P(A = a, B = £) is not defined. 

Remark 1.2. Suppose that we could find some procedure Re, to regularize 
fluctuating frequencies yy(A) and (or) vy(à'/A). By Re we obtain generalized 
probabilities p, and (or) paw (which in principle can be negative numbers). Sup- 
pose that (in the case of the infinite set A) we could find some procedure Reony to 
regularize (probably diverging) series 


dopa >> pay, >> Pa D> Paw (1.10) 


AEA NEA ACA  NEBg 


in such a way that their sums coincide with conventional probabilities PO" (Ag) 
and P©v(B,), respectively. We now apply Reonv to series 


Ym J pw. (1.11) 


ACA A'EAaNBg 


In principle there may be different variants: 1) the procedure Reony does not work 
for series (1.11); here we could not assign any real number to (1.11); 2) despite of 
fluctuations of frequencies (1.9), the Reony still works for series (1.11) and gives a 
real number; but this number is not related to the statistical limit of frequencies 
(1.9) (in particular, it may be a negative number). 

ŠIf |A| = œ, then events Ay and Bg may differ rather slightly: vy(A/\) © vn (B/A) 
for each À € A. But the infinite average over A can produce behaviour of frequencies (1.9) 
which essentially differs from behaviour of frequencies (1.7) and (1.8). 


Negative probabilities 117 


This simple statistical consideration explains the origin of difficulties 
with ‘simultaneous existence’ of incompatible properties of quantum systems. 
Therefore the presence of incompatible properties does not demonstrate some 
essentially new ‘quantum’ properties of reality. It only demonstrates that 
the law of large numbers is violated for internal (hidden) properties of so 
called quantum systems (mainly because we could not control the statistical 
behaviour of these properties in our (macro) preparation procedures). For 
some events fluctuations on the microlevel can compensate each other and 
produce the statistical stabilization of observed frequencies (1.7) and (1.8). 
At the present time such events are called physical events. For other events 
fluctuations on the microlevel cannot compensate each other; there is no sta- 
tistical stabilization of observed frequencies (1.9). At the present time such 
events are called nonphysical. 

There are also no reasons to suppose that (in general generalized) initial 
probability distribution p) and conditional probabilities p), can be chosen 
in such a way that fluctuations in both expressions (1.7) and (1.8) could be 
compensated so that, for some values A = ap and B = 6o, both frequencies 
Yv(Ag,) and vy(Bg,) stabilize to probability 1. This is nothing than the 
statistical explanation of the principle of the complementarity. It seems that 
(rather unclear) considerations of R. Feynman [37] can be interpreted in such 
a way. 

Thus we proposed the purely statistical explanation of the phenomenon 
of incompatibility for some quantum observables. Here the problem of dis- 
turbance effects of measurements is totally excluded from considerations. 
Our approach implies that even the possibility to perform measurements on 
quantum systems without any disturbance effect would not imply that in- 
compatible properties can be measured simultaneously®. Different structure 
of sets {A € Ag}, {X’ € Bg} and {X € AaNBg} might still imply fluctuations 
of frequencies (1.9). 

Thus the careful probabilistic considerations show that there may exist physical 
(in the sense of the verification by the law of large numbers) properties A, B 
such that the simultaneous existence of these properties could not be verified on 
the physical level. In such a situation one of the possibilities is to exclude pairs 


®The idea that the presence of incompatible observables in the quantum formalism (and, 
in particular, the Heisenberg uncertainty relation) is not a consequence of disturbance 
effects of the process of a measurement, but a consequence of the internal statistical 
structure of a quantum state (or a preparation procedure), has been intensively discussed 
in quantum physics (Prugovecki [94], Ballentine (8]). 
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C = (A,B) of incompatible properties from considerations (this is the modern 
quantum viewpoint) 7. However, there is another possibility, namely, consider 
some regularization procedure R for (1.9). If (1.9) could be regularized via R, 
then C can be considered as R-physical property. Thus we can essentially extend 
physical reality by considering new R-elements of reality. As we have already 
remarked, in many cases one of the simplest ways to regularize (1.9) is to use the 
p-adic topology, instead of the real. Here frequencies yy(Ag N Bg) may have the 
limit in Qp, despite of fluctuations in R. However, the possibility of a p-adic (and 
any other) regularization of (1.9) need not imply the possibility to use the same 
regularization for (1.7) and (1.8). In principle A and B need not be elements of 
new reality (despite of the fact that C = (A,B) is an element of this reality). 
Nevertheless, there may be coincidences such that all series 


P®(Aa)= >> pa >> pay ,P”(Bs)= > pa X pw. (1.12) 


AEA NE Aa AECA A'EBg 


R(AaNBs)=S p JO pw (1.13) 
AEA  NEAaNBg 
converge with respect to R. In such a situation all events, Ax, Bg, Aa N Bg are 
R-physical events. It could be that R-probabilities P? (Aa) and PF (Bg) coincide 
with conventional probabilities PO°™ (A) and PC°® (Bg). However, in general 
PP (Aa) 4 PO (Aa), and (or) P®(Bg) # PO (Bg). In the ensemble frame- 
work the previous considerations can be interpreted in the following way. The 
system of events F(rs) for the ensemble S need not be an algebra. The sets 
Cop = Aa N Bg need not belong to F'(ns). However, we may try to extend the 
ensemble probability to larger class of sets by using some regularization proce- 
dures. Sometimes it is possible and sometimes it is impossible to define ensemble 
probabilities for Cag and preserve ensemble probabilities for sets Ag and Bg. 
Thus the modern physics is based the Kolmogorov physical reality. This 
model of physical reality can be extended by considering non-Kolmogorov 
physical realities. We conclude our considerations by the equality: 


Model of Reality = Model of Probability. 


We now consider the principle of complementarity in the framework of 
f-realism. The main difference between z-realism and f-realism is that in the 


7E. Prugovecki pointed out [94] that, far from restricting simultaneous measurements of 
noncommuting observables, quantum theory does not deal with them at all; its formalism 
being capable only of statistically predicting the results of measurements of one observable 
(or a commuting set of observables). 
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first case we can assume that conditional probabilities pax do not depend on 
a measured property and in the second case a measurement of a property D 
produces pay = p¥,,. Here 


n(Aa) = So un(A) X` uw(X'/d; A), (1.14) 


AEA A'EAa 
= Soon) X vN(à'/à; B) (1.15) 
AEA ’'EBsg 
w(AaM Bs) = X uw() XO wn (V'/A;C), C= (A,B). (1.16) 
AEA NE Aa Bg 


Here (even for finite sets A of hidden variables) the statistical stabilization 
of frequencies (1.14) and (1.15) need not imply the statistical stabilization of 
frequencies (1.16). 

8. History of negative probabilities in physics. The possibility to 
obtain negative probabilities via a regularization of ensemble and frequency ap- 
proximations (1.1) and (1.3), respectively, is so natural that the negative attitude 
against negative probabilities in physics can be only explained by the common use 
of Kolmogorov’ theory of probability. From the frequency viewpoint this use imply 
the common viewpoint that relative frequencies must always stabilize; from the 
ensemble viewpoint this use imply that statistical ensembles of physical systems 
must always have a homogeneous structure with respect to all their nonobserved 
properties. A negative psychological reaction to the appearance of negative prob- 
abilities in physical models implies the desire to forget papers in that negative 
probabilities play the fundamental role. 

Although it is well known, for instance, that P. A. M. Dirac was the first 
to introduce explicitly the concept of negative energy, the number of those who 
know his investigations [34] about negative probability - closely related to negative 
energy and invented simultaneously-seems to be very restricted. This concept is 
used with reservations but, as it seems, not without a certain kind of sympathy. 
Said paper (see section ? for the details) is not the only one on this topic meanwhile 
has been forgotten, at least as far as negative probability is concerned. 

Another example is the famous Wiegner distribution [111] W (q, p) which had 
been introduced as a probability distribution (see section 4). And it has no other 
physical interpretation than a probability distribution. However, the appearance of 
negative probabilities for some quantum states implies that Wiegner’s distribution 
is not more interpreted as a probability distribution (many physicists prefer to call 
W (q, p) Wiegner’s function). 
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In the framework of the EPR experiments violations of Bell’s inequality could 
be easily explained if we suppose that there exist negative probability distributions. 
However, papers on negative probability description of the EPR experiments (see, 
for example, the review of W. Muckenheim [90]) did not play large role in the 
polemics on the EPR experiments. Physicists prefer to accept the death of reality 
(namely, the impossibility to use realism in quantum world; thus, in fact, the ab- 
sence of objective laws in reality) or nonlocality of space-time than to use negative 
probabilities. 

The existence of quantum observables with continuous spectra is in the evident 
contradiction with the discreteness of results of real physical measurements. E. 
Prugovecki [94] developed a theory of quantum measurements with a finite preci- 
sion (which takes into account reading errors of individual measurements). One 
of the great advantages of this theory is the possibility to describe simultaneous 
measurements of incompatible observables. However, there appear again negative 
probabilities’. As always, this implied the extremely strong critic of the theory. 


2 Signed ‘probabilistic’ measures and Einstein- 
Podolsky-Rosen paradox 


We start this section with brief mathematical introduction to the theory 
of signed measures (charges). Let Q be a set and let F be a o-algebra 
of its subsets. A o-additive function u : F — R is said to be a signed 
measure (charge). Thus u(U%,An) = X w(An) for any sequence A, €E 
F; ANA = 0,04 J. 

Example 2.1. (Discrete measures) Let Q = {21,22,...,2n,...} be a 
countable set and let F be a o-algebra of all subsets of Q. Let {a,}92, be a 
sequence of real numbers such that 7° , |an| < co. We set u({tn}) = an and 
MA) = Prca H({En}) for A € F. The u: F — R is a signed measure. On 
the basis of this simple example we illustrate some important notions of the 
general theory of signed measures. Set Q = {z; E€ Q : p({z;}) < O}, Q4 = 
{x; E€ Q : u({z;}) > 0} and N = {z; € Q : u({z;}) = 0}. It is evident that 
for ay EEF: 


WENQ_) <0 and p(ENQ,) 20. (2.1) 


8This has the natural explanation on the basis of our interpretation of negative prob- 
abilities: the violation of the law of large numbers for such measurements. 
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Let U,V € F,UNV = Í and let % = U UV. Set O = O_UU and 
MY, = 2, UV (thus Q = QL UM). Then the sets (% and 0!, has same 
property (2.1) as the sets Q_ and Q4. Set 


w(E)=-W(ENO)= X janland y*(B) =EN% )= X an. 


En EEN anc En, 


Then (E) = p* (E) — u (E). This representation of u is unique (in spite of 
nonuniqueness of a representation Q = QL U !,). We can associate with a 
signed measure ys the positive measure |u| = wt + uw, H(A) = do, cg lanl: 

In fact, this particular example demonstrated all main features of signed 
measures. We consider now the general case. 

Definition 2.1. Let u be a signed measure defined on a o-algebra F of 
subsets of a space Q. Then the set A C Q is said to be negative with respect 
towifENAE€EF and u(ENA) <0 for every E € F. Similarly, A is said 
to be positive with respect to u if EN A E€ F and (EN A) > 0 for every 
EEF. 

Theorem 2.1. (Hahn-Jordan) Given a signed measure u on a o-algebra 
F of subsets of Q, there exists a set Q. € F such that Q_ is negative and 
NQ, =Q\ Q- is positive with respect to p. 

Proof. Let a = inf (A) where the greatest lower bound is taken over all 
negative sets A E€ F. Let A, E€ F,n = 1,2,..., be a sequence of negative sets 
such that limp... (An) = a. Then the set N. = U%, A, E F is a negative 
set such that «(Q_) = a (this is a consequence of o-additivity of u). To show 
that Q_ is the required set, we must only show that Q, = Q\Q_ is positive. 
It is possible to show that the assumption Q, is not positive will imply the 
contradiction (see, for example, [78] for the details). a 

Thus we can represent 2 as a union 


of two disjoint measurable sets 0, and Q_, where Q, is positive and Q_ is 
negative with respect to the signed measure u. The representation (2.2) is 
called the Hahn decomposition of Q, and may be not unique. However, if 


1 1 2 2 
Q=NLUNL, N= NUN 
are two distinct Hahn decompositions of Q, then 


WENQL) = WENO?) , ENQ) = (EN) (2.3) 
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for every E € F. In fact, EN (QL \ 22) c ENQ! and at the same time 
En (01 \ 92) c ENO2. This imply that 


WEN (ME \O2)) <0 and (En (9 \2)) > 0. 
Thus (E N (QE \ 92)) = 0, and similarly (E N (92 \ O1)) = 0. Therefore 
(ENQ!) = (En (92 \7)) + p(B (aL Na2)) 


= WEN (SÈ \ 0) + WEN (OL N È)) = WEN), 


which proves the first of the formulas (2.3). The second formula is proved in 
exactly the same way. 

Thus a signed measure p on the space Q uniquely determines two non- 
negative set functions, namely 


H+(E) = WENO), pw (E) =—p(ENQ) 


called the positive variation and negative variation of u, respectively. It is 
clear that 

l)papt—p; 

2) wt and ys” are nonnegative a-additive set functions, i.e., measures; 

3) The set function |u| = wt + u~, called the total variation of u, is also 
a measure. 

The representation u = ut — u” is called the Jordan decomposition of p. 

We can present a formal generalization of Kolmogorov measure-theoretical 
approach. We define a signed probability space as the triple P = (0, F, P), 
where Q is an arbitrary set (points w of Q are said to be elementary events), 
F is an arbitrary o-algebra of subsets of Q (elements of F are said to be 
events), P is a o-additive signed measure (a charge) on F normalized by the 
condition P(Q) = 1. 

It is a generalization of probability. There can be events which have 
negative probabilities and probabilities which are larger than 1. However, our 
consideration in subsection 1 give strong motivations to use signed probability 
spaces in physics. Moreover, there are analogues of the law of large numbers 
and central limit theorem for signed probabilities (see [9], [51], [52]) which 
also improve the use of signed probability spaces. 

‘There are no physical reasons to assume that even in the case of signed prob- 
abilities the system of events has the structure of a set algebra (see also Chapter 
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4). It is natural to consider signed probability semi-measures defined on set semi- 
algebras (the reader can obtain the definition of signed probability semi-measure 
by analogue to Definition 5.2 of Chapter 1). However, it seems that the cor- 
responding mathematical formalism is not yet developed. In particular, I do not 
know anything about the possibility to obtain the Jordan decomposition for signed 
semi-measures. 

As it has been already mentioned, some physicists (see, for example, [90]) 
assume that probability distributions involved in Bell’s considerations are 
signed probability measures. This assumption implies that we could not use 
the standard probabilistic estimates. Therefore there is no Bell’s inequal- 
ity at all. From this viewpoint experiments for testing Bell’s inequality can 
be considered as experiments for testing foundations of probability 
theory. 

We discuss now carefully the origin of negative probabilities in the EPR 
framework. Let us follow the ideology of hidden variables. Consider a num- 
ber N of particles prepared in a pure quantum state and possessing hid- 
den variables Az, k = 0,...,n. Assume that the different values A are taken 
with (probably generalized) probabilities p,. By an interaction (the nature 
of which need not be specified) the values of these hidden variables change 
from À; to à}, j = 0,...,m, the transition probability being denoted by p,;. 
By this interaction the pure state may split into l < m experimentally dis- 
tinguishable states. Let A be one of such states. The set of values of j 
such that Xj; form the state A is denoted by the symbol j(A). The result 
of a measurement exhibits N(A) particles in the state A and gives relative 
frequencies vy (A) = xa, By statistical stabilization of these frequencies we 
obtain frequency probabilities: P~'( A) = limy—oo vy(A). The combined 
transition probability for the state A can be found with the aid of the formula 
of total probability: 


n 


pom(A) = Jim $ P: >» Pri = 2 Pk bp Pk; - (2.4) 
0 


k=0 = jej(A) k= GEI(A) 


All probabilistic considerations on Bell’s inequality are based on the as- 
sumption that the observed frequency probabilities PM‘**(A) must coincide 
with combined transition probabilities P°°"(A) (defined by (2.4)). By this 
assumption we can use hidden probabilities Pk, Pkj, in calculations related 
to Bell’s inequality. However, as it has been already mentioned in section 1, 
the formula of total probability (2.4) can contain some pathologies. These 
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pathologies could be in principle eliminated by some regularization procedure 
R °. However, R can produce nonconventional probabilities p;, p;;. 

The problem of fluctuating of frequencies vy (Ax) and (or) yy(Xj/Ax) have 
been already discussed in section 1 (see also Chapter 2). We pay now atten- 
tion to the average over an infinite set of hidden variables A. So let |A| = 
We have 


PMises( A) = lim vy(A = Jim lim ) v(x) vn(Xi/rAx~), (2.5) 
Noo Nzn I 


=I JEj(A) 


where Aj, ..., An, n = ny are hidden states of particles s1, ..., sy. On the other 
hand, we have 


pom( A) == = lim P lim n Yy (Àk) Xod lim vy (à;/Ak) = 


jEj a~ 


1 
= lim nim, yo (Ax) DD vn (À; /Àk) - 
=1 jEj(A) 
To obtain the equality PMisss( A) = P°™( A), we have to change the order of 
limits 
lim lim — lim lim. 
N—>œ n-00 n—oo Noo 


However, we could not do this in the general case. First of all, as we have 
already discussed, it may be that P(A) = limy_,~ vy (A) exists but some 
of limits limy—.. vw (Ax) or limy—oo un (A; /Ax) do not exists. On the other 
hand, it may be that, for example, all pMise = = limy Vn (Ax) = 0, i.e., 


lim lim Sve > Un (Aj/Ax) = 0 


n—0oo N- 00 p 
jJEj(A) 


But at the same time P™'*(A) 0. However, it is possible to justify (in some 
cases) the change of the order of limits with the aid of some regularization 
procedure. 


In fact, the R consists of two regularization procedures: 1) Ry gives a regularization 
(N — oo) of (in general fluctuating) frequencies; 2) Reonv gives a regularization (n — 00) 
of (in general infinite) average over A. 


Negative probabilities 125 


We now consider the ensemble approach to find the origin of negative 
probabilities. Let us start with following example. 

Example 2.2. (Negative distribution of hidden variables) The hidden 
variable À has the infinite number of values A = Ag,...,An,--- . A statistical 
ensemble S contains n(A) = 2',/ = 0,1,..., particles with A = A. Let us 
consider the sub-ensemble S™) of S which contains all particles with À € 
{Xo, --- An}. Thus |S] = 1+- +2 = 2+ — 1 and p™ = Pom) (Ax) = 
ra» 0< k< n. The formula of total probability for the ensemble S™ has 


the form: 
Psw (A) = yet ý Y Py 


jEj(A) 
(here it is assumed that conditional probabilities p,; depend only on the 
interaction; they do not depend on n). If n — oo, then S™ — S and 
limn—oo Pgm (A) = P(A). However, for probabilities pọ with respect to the 
ensemble S, we have px = limpo p% = 0. Thus, in general, 


)= lim Soph 5 pu #5. lim pf” X Pay =0. 


k=0 jEj(A) JEj(A) 


We make some formal computations (which, of course, has no meaning in 
the framework of real analysis). First, we find the ‘number of particles’ in 


S: 


=~ 1 

= k = 

B= Se iy, (2.6) 
k=0 
Then we find probabilities 

|S(A = Az) | k 

= OF (2.7) 

IS] 


Here J kopr = 1. Thus we obtained negative ensemble probabilities. We 
can apply the ensemble formula of total probability to these probabilities: 


Ps(4) = yom `> Pkj (2.8) 
k=0 jej(A) 


(at the moment we assume that conditional probabilities are ordinary positive 
probabilities, Pa; > 0). Let, for example, Pk; = g; > 0 do not depend 
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on k. Then Ps(A) = (Pro Pk) (Zea t) = Djeg 2 0 is ordinary 
probability (in spite of the presence of negative probabilities). We can also 
consider k-dependent conditional probabilities p,;. Let, for example, A = 
{No} and Pro = 0,k = 21+1, pro = 1/2'**, k = 2l, where s = 0,1,... is some 
(fixed) parameter of the model. Then 


is the ordinary probability. 
Let Pko be the same as above and let pı = 1,k = 214+ 1, pu = 
(1 —1/2'+8),k = 21. Set B = {X,}. Then 


CO (e 0] 
Ps(B) = y P2i+1P(ai+1)1 + 5 PaP) 
1=0 1=0 


= = 1 2 1 1 1 
— 2i+1 2l — een 
=-(ġ_2 +52 ee a 


l=0 l=0 


Of course, these ‘generalized probabilities’ have some properties which 
are in contradiction with the common probability intuition. For example, let 
s = 0. Then Ps(A) = 1, P(B) = 0 (despite of the fact that py: = 1 and 
Pk # 0,k = 2l +1). 

In Chapter 4 we shall see that all these formal manipulations can be realized 
on the mathematical level of rigorousness in the p-adic probabilistic framework. In 
particular, from the p-adic viewpoint probabilities p, = —2* are infinitely small 
probabilities. Thus in the ensemble S the proportion of systems having the fixed 
value A; of À is infinitely small. All these infinitely small probabilities must be 
identified with zero probability in the conventional probability theory. 

Example 2.3. (Negative conditional probabilities and negative proba- 
bilities for hidden variables). Negative conditional probabilities p,; may also 
appear in quite natural statistical ensembles. We assume that the interac- 
tion which determines the transition À% — à; can be represented as a finite 
chain of steps (trajectory), (x), and at each step a particle can have one of 
two states, 0 or 1. Thus a trajectory of the interaction with n steps has the 
form (£)n = (t1,...,Un),U; = 0,1. In our model we simply assume that the 
transition A4 — à; is realized via a trajectory of the length / (thus, for fixed 
l, conditional probabilities px do not depend on k). Consider the statistical 
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ensemble G; of trajectories having the length l, where l = 0,1, 2... (we con- 
sider also a ‘trajectory’ of the length / = 0, which describes direct transition 
Ak — Ny). Set G™ = UpenG). Then |G,| = 2! and |G™| = 2+ — 1 and 
IG| = lim, oo |G™| = 7°, 2* = —1. Thus 


Gl y 
Pri = aA Don 


Suppose that as in the above examples p, = —2* and that an experimen- 
tally distinguishable state A is determined by values \y,,k = 0,1,..., Le., 
A = {X, --+) Ade,» $- By the formula of total probability we have 


Ps(4) = -D0 -2) = 5. 
1=0 j=0 


O co 9 
4 l 2j+1 
Ps(A) = -20-7 = 5. 
1=0 j=0 
However, for A; = {X}, Ps(Aj) = —2? <0. 

We shall see in the p-adic framework that such probabilities can be interpreted 
as infinitely small (but nonzero!) quantities. Thus in this model not only proba- 
bility to obtain A = A, for fixed j is infinitely small, but also probability of each 
transition A, > À; is infinitely small. 

We can easily modify the above example and introduce conditional prob- 
abilities p; which depend on k. 

Example 2.4. (Negative conditional probabilities and positive proba- 
bilities for hidden variables) Assume that the interaction which determines 
the transition A, — A; can be represented as a chain of the length / of steps 
(trajectory), (z). Assume that at each step a particle can have one of states 
d € Dı = {di,...,d;} and each state d € D, can appear in a trajectory (x), 
only one time. Thus a trajectory for the transition 4, — à; has the form 
(x), = (t,...,u), uy E Di, u A uji j, ie, (£) = o(di,...,d)) is a 
permutation of elements of the set D,. It is also assumed that sets of states 
D; satisfy the condition of consistency: Dj,; = Dı U {dj;1}. We consider 
now the following statistical ensembles: G, l = 1, 2,... (all trajectories of the 
length 1); G™ = UR G; (all trajectories of the length < n); G = U%,)G; (all 
trajectories of a finite length). Then |G| = l!, |G™| = Xo k! . Therefore 
we obtain that in the framework of real analysis 

|Gi| 


Posen (Ai/ Ak) = gej 7 0 Os 


128 Chapter 3 


i.e., Pg(Aj/Ax) = 0 in the convectional probability theory. In such a sit- 
uation (even if hidden variable À has the ordinary Kolmogorov probability 
distribution; for example, p = 1/2*+!,k = 0,1,...) we obtain (of course, 
only formally) that 


Ps(A) = lim $ opr X Pse (M/Ae) = D> Pe >> lim Pow (X/Ax) = 0. 
) 


k=0 lel(A) k=0 lel(A 


However, if we justify (via some summation procedure) the calculation |G| = 
Szok! (in particular, in the p-adic framework), then (nonconventional) 
probabilities 


Pki = TEk! #0 (2.9) 


are well defined and the formula of total probability can be applied to these 
probabilities. 


3 Wigner phase-space distribution and neg- 
ative probability 


Even in non-relativistic quantum mechanics negative probabilities creep into 
the picture. To formulate a conventional (Maxwellian) probability distribu- 
tion of the coordinates x and momenta p, similarly to statistical mechanics, 
is plainly excluded by the corresponding uncertainty relation which prevents 
at least the simultaneous knowledge of these quantities. Wigner and Szilard, 
however, found a distribution function which for the first time was applied 
by Wigner in order to calculate the quantum correction to the gas pressure 
formula. If a wave function (a1, ..., £n), abbreviated by w(x), is given, the 
corresponding Wigner function reads 


P(x,p) = (mh) f ” dyġa+yi-yjepiio,y)/h} E1) 


with x, y and p vectors having as many components as has the configuration 
space of the y, namely n; where (p, y) denotes the scalar product. In order 
to demonstrate the fundamental features of the Wigner function, relevant 
for the present purpose, it is sufficient to consider a single particle in linear 
motion. Thus n = 1 and the vector symbols will be dropped henceforth. The 
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Wigner function exhibits remarkable similarities to a probability distribution 
in that it leads to the correct probabilities for the coordinates when integrated 
with respect to the momenta (the integration range is always understood to 
be (—oo, oo) unless indicated otherwise), 


/ P(2,p)dp = |b(2)?, (3.2) 


and, vice versa, it gives the proper probabilities for the momenta when inte- 
grated over the coordinates, 


[Pera = (2mh) f aya) exp{—ipx/h}/? (3.3) 


Although Wigner calls it the probability function of the simultaneous values 
for the coordinates and momenta (in more recent papers the notation ‘quasi- 
probability’ is adopted) he stresses in the same context, that it cannot really 
be interpreted in this way “as is clear from the fact, that it may take negative 
values. But of course this must not hinder the use of it in calculations as an 
auxiliary function which obeys many relations we would expect from such a 
probability” [111]. The existence of Wigner functions taking negative values 
is firmly proved by imposing two very general conditions on P which can be 
said to define this type of probability distributions, namely: 

(i) P(x, p) should be a Hermitian form of the state vector (x), i.e., with 
M(z,p) a self-adjoint operator, 


P(x,p) = (Y, M(a,p)w). (3.4) 


This condition makes P(x, p) a real number. 
(ii) P(z,p) should give the proper expectation values for all operators 
which are sums of a function of p and a function of z, 


[ [Petos u (FZ) +0] e 


This condition is a somewhat milder form of (3.2) and (3.3) which properly 
have to be understood as axioms of the Wigner function and, in any case, 
must be satisfied. Further, it suffices to consider such y which are linear 
combinations Y = ay, + by, of any two fixed functions, vanishing in certain 
intervals of z. Now, by requiring 


P(x, p) > 0 for all z and p (3.6) 
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for all z and p for every such w, Wigner obtains a contradiction which in 
short runs as follows: 

Consider an interval J, inside of which y(x) = 0 and g(x) > 0, while 
g(x) = 0 outside and f(p) = 0 everywhere. Then (3.5) leads to 


J [Pevale\arae = 0. (3.7) 


Thus 
[Pieps =0 (3.8) 


for all p (except a set of measure zero). 

From (3.6) and the condition imposed on g(x) we obtain (Wigner’s lemma): 
If y(x) vanishes in an interval I, the corresponding P(x, p) vanishes (except 
for a set of measure zero) for all values of x in that interval. Now, consider 
two functions y1 (x) and y2(x) which vanish outside of two non-overlapping 
intervals J}, and Ip, respectively. Because of (3.4) P(x, p) corresponding to 
Ņ% = ay, + by will have the form 


P= lal? P: + ab Pio + ab Pa, + |b? Po. (3.9) 


By setting b = 0 it is obvious that P}, is the Wigner function of y, (and P2 
of %2). The meaning of P12 and Pa: is less obvious, but we need not bother, 
because both must be identically zero. This can be seen by considering 
any interval I’ outside J,. Since, according to the above lemma, P, vanishes 
almost everywhere in interval J’, (3.9) cannot be positive for every choice of 
a and b unless Pj, = P21 = 0 outside [,. The same proof applies to I2. Thus, 
instead of (3.9) we have 


P= jal P: + |b Pa (3.10) 
almost everywhere. In order to complete the contradiction, let us denote the 


Fourier transforms of yı and %2 by ġı(p) and ¢2(p), respectively. Equation 
(3.3) then reads 


lal? f Pi(c,p)de + |b} J Pani 


= |al’lda(p)|° + |b|"\¢2(p)? + 28[abgi (p)b2(p)]. 


Negative probabilities 131 


Since this must be valid for all a and b, we must have identically in p : 
1(p)¢2(p) = 0. This is, however, impossible since ¢, and ¢2, being Fourier 
transforms of functions restricted to finite intervals, are analytic functions of 
their arguments and cannot vanish over any finite interval. 

In order to illustrate this result, the Wigner function formalism may be 
applied to the paradigm of quantum theory, the linear harmonic oscillator 
(see W. Muckenheim [90]). From its Hamiltonian 


H(z, p) = p?/2m + mux? /2 (3.11) 


and the equation for eigenfunctions of this Hamiltonian: 


P h ð 
It is easy to find the wave function of the ground state 
polz) = (mw/h)"* exp(—2?muw/2h) (3.13) 


corresponding to the energy Ey = hw/2. Inserting (3.13) in (3.1) and inte- 
grating out in y leads to 


Po(z,p) = (nh)! exp(—z?°mw/h — p?/mwh), (3.14) 


which does not exhibit any anomaly in that it is non-negative and, when 
integrated with respect to x, supplies the proper distribution of the momen- 
tum 


J Poep)ae = (mwrh)™? exp(—p*/mwh), (3.15) 


which is a Gaussian distribution with expectation zero and standard devia- 
tion (Ap)? = mwh/2. Integrating with respect to p yields, as expected, the 
square of (3.13), 


J Babsan ep e r (3.16) 


also a Gaussian distribution with expectation zero and standard deviation 
(Az)? = h/2mw. Gaussian distributions satisfy Heisenberg’s uncertainty re- 
lation in its marginal form, i.e., as an equality. From (3.15) and (3.16) we ob- 
tain (Az)(Ap)? = h/2. It may also be noted that the distributions of momen- 
tum and position are statistically independent, because f Podp f Podz = Po. 
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A fortiori, the covariance coefficient is zero. Clearly, this example does not 
contradict Wigner’s ‘negativity proof’ because the latter only says that there 
are state functions for which the corresponding P(z, p) cannot be everywhere 
non-negative. One of those is the first excited state of the harmonic oscilla- 
tor. Using the state function of the first excited level, the same formalism as 
described above will lead to the corresponding Wigner function. 

With H the Hamiltonian of (3.11) and L, the nth Laguerre polynomial, 
the Wigner function corresponding to the nth excited state can be expressed 
by 

P,,(x,p) = (wh)~1(—1)” exp(—2H/hw) Lp (4H/hw) (3.17) 


or, using (3.14), 
Pn(z, p) = (—1)"Po(z, p)Ln(4H/hw). (3.18) 


As Po was found to be non-negative everywhere, we have to examine the 
remaining expression 


P,,/Po = (-1)"Ln(4H/hw). (3.19) 
The first-order Laguerre polynomial is 
Iyi(u) = 1-4. (3.20) 
Hence, Pi(z,p) goes negative for 
H = p’/2m+mw?2?/2 < hw/4. (3.21) 


Therefore the Wiegner distribution P,(z,p) becomes negative only in the 
extremely small domain (ellips (3.21)). As the energy of the first excited 
state E, = 3hw, probability of an energy measurement P(E < hw/4) (where 
E is the energy of quantum harmonic oscillator) is equal zero. Hence in this 
example negative values of the Wiegner distribution P(x, p) correspond to 
events which have zero conventional probability. The use of the Wiegner 
distribution can be interpreted as a kind of splitting of conventional zero 
probabilities by using negative numbers (as a class of labels to denote proba- 
bilities of events which are identified in the conventional framework with the 
label ‘0’). 

We now consider the Wigner function of the second excited state. The 
second-order Laguerre polynomial is 


Lo(u) = 2—4u+u?. (3.22) 
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Using (3.19) we obtain that P2(z,p) goes negative for 

1 -1/2 1 -1/2 

zel -24 <H < zl +274), (3.23) 


As the energy of the second excited state Fy = Šhw, probability of an energy 
measurement P(E < thw(1 + 27™2)) is equal zero. Hence in this example 
negative values of the Wiegner distribution P(x, p) can be also interpreted 
as additional labels for probabilities (which are identified with the label ‘0’ 
in the conventional probability theory) corresponding to events which have 
zero conventional probability. For H = 0 and H — oo however, P2(z,p) is 
non-negative. 

We will not leave this illustrative example without noting some gen- 
eral features of Wigner functions of the linear harmonic oscillator. From 
limysoo Dn(u) = (—1)"u” and (3.17) we find P, being positive and asymp- 
totically approaching zero for H going to infinity. In the special case of 
H = 0, Ln(0) = n! together with (3.17) makes even-order P, being positive 
and odd-order P,, being negative at H = 0. 

Most interesting in the present context is, however, that all these Wigner 
functions of nonzero order unavoidably will take positive as well as negative 
values. This can easily be seen from the orthogonality relation 


— L e™"Lnlu)Lm(u)du = dam. (3.24) 


Cohen (see, for example, review [90] for the details) could show that 
a wide class of probability distribution functions is supplied by the rather 
general expression 


P(z,p) T 
(ony? | | | £00,7) exp(-ida—irp + iOu)y (u~rh/2\0(w+-rh/2)dBdrdu. 


Herein f is simply a smearing function. By setting f = 1, substituting 7 by 
—2y/h and integrating over 6 and u, we obtain the original Wigner function 
(3.1). Other distribution functions may be built with different functions f, 
if only f satisfies the condition f(0,7) = f(@,0) = 1 in order to yield the 
correct quantum mechanical marginal distributions. 

Cohen imposed the following conditions on a general distribution function 
P(az,p): (i) those given by (3.2) and (3.3); (ii) if the quantum mechanical 
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mean value of the Hermitian operator M is (M ), then there should exist a 
function gy(z, p) such that 


(in = f f om2. PP p)dpdz ; (3.25) 


and, for any function K, 


(KD) = | f Klont, p)P(E,p)dpas . (3.26) 


And he found, that, irrespective of whether P is positive semidefinite or not, 
condition (ii) can never be satisfied. The Wigner function P, of the harmonic 
oscillator, e.g., yields the correct expectation value for the mean energy, but 
fails to supply the zero-standard deviation, which one should expect from 
a quantum mechanical energy eigenstate. He concludes: ”Of course, it can 
be argued that the classical formalism does go through as long as we do 
not insist that the function which must be used to obtain the mean value of 
a function, K, of g is not identical to K(g). But this would carry us even 
further from the conceptual basis of classical probability theory than does 
quantum mechanics itself”. 

Finally we note that the equality (3.25) is the direct consequence of the 
formula of total probability. Let M bean orthogonal projector in the Hilbert 
space of quantum states. It represents the physical observable M = 0,1. 
Here P(M = 1) = (M) and (3.25) is nothing than the formula of total 
probability for the initial probability distribution P(«,p) and conditional 
probabilities P(M = 1/(z,p)) = gu(a,p). If we follow to our interpretation 
of negative probabilities, then we obtain that Hermitian operators represent 
all physical observables which permit measurements having the property of 
the statistical stabilization. Non-Hermitian operators represent a new class 
of physical observables which do not permit measurements with the property 
of the statistical stabilization. Here we could obtain (for some states) the 
negative mean value for an observable with positive values. 

On the basis of our interpretation of negative probabilities we can finish 
this section by 

Conclusion. From the frequency viewpoint negative values of Wiegner’s 
probability distribution is nothing than the exhibition of the absence of the sta- 
tistical stabilization of relative frequencies vy((x,p) E€ U) for some domains 
U of the phase space; from the ensemble viewpoint negative values of Wiegner 
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probability distribution is nothing than the exhibition of nonregular structure 
of infinite statistical ensembles of hidden properties which determine the point 
(x,p) of the phase space. 


4 Dirac’s world with negative probabilities 


The necessity of extended probabilities becomes most distinct if a Lorentz- 
invariant formulation of quantum theory is attempted. The special role that 
time plays in non-relativistic theory can, e.g., in the most simple case of 
particles with no charge and spin, be removed by means of the Klein-Gordon 
equation which for a single free particle of rest mass m is given by 


ond om 0? ond A 
(aat atag gtm )»=0, 


where (h = c = 1). Born’s notion, however, according to which the square 
of the wave function has to be interpreted as probability density, necessarily 
must fail in this context, because |p|? as a scalar violates conservation of 
total probability. On the other hand, the density proposed by Gordon and 
Klein 


P (£o, £1, £2, £3) = : (See = vee) ) (4.1) 


2im \ Axo Oxo 


satisfies as time component of a four-vector the conservation law, and thus 
(4.1) is evidently the correct mathematical form to use, but, clearly, it can 
go negative. 

This is not the only difficulty. If the wave function of a plane wave 


w = exp[—i(poXo — pit1 — p212 — p3X3)|, Po = E, 


is transformed to the momentum and energy variables, the Gordon-Klein 
expression (4.1) goes over 


|Y (po, Pı, P2, p3)| po dpidp2dps (4.2) 


as the probability of the momentum having a value within the small domain 
dp,dp2dp3 about a value pı, p2, pa with the energy having the value po, which 
must be connected with p1, po, p3 by 


Po — Pi — pa — P3 — m? =0. 
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The weight factor py)’ appears in (4.2) and makes it Lorentz invariant, since 
w(p) is a scalar - it is defined in terms of p(z) to make it so - and the dif- 
ferential element p3 'dp,dp2dp3 is Lorentz invariant. This weight factor may 
be positive or negative, and makes the probability positive or negative ac- 
cordingly. Thus the two undesirable things, negative energy and negative 
probability, always occur together. By our interpretation of negative prob- 
abilities one of possible explanations of this fact is that the probability to 
observe a particle with a negative energy is infinitely small. 

Dirac formulates an alternative approach to quantum electrodynamics 
which allows for a conventional treatment of particles with half-odd integral 
spin, but unavoidably entails negative probabilities when applied to particles 
with integral spin, in special cases even demanding probabilities of plus or 
minus 2, distinctly outside the usual range. On the other hand, this relativistic 
theory has great advantages over the usual method in that it avoids the most arti- 
ficial process of renormalization. With respect to the latter, Dirac never changed 
his mind, qualifying it as a ‘working rule’ and considering its results, in spite of 
their accuracy, as not reliable. Indeed, the commonly applied method of renormal- 
ization is a thing between artificial and nonphysical. We are left between Scylla 
and Charybdis, in that our equations contain either probabilities as large as plus 
or minus 2 or electron masses exceeding that of the whole universe. Obviously, also 
Dirac was very sceptical about those “undesirable things, negative energy and neg- 
ative probability”, but he asserts: “Negative energies and probabilities should not 
be considered as nonsense. They are well-defined concepts mathematically, like a 
negative sum of money, since the equations which express the important properties 
of energies and probabilities can still be used when they are negative. Thus nega- 
tive energies and probabilities should be considered simply as things which do not 
appear in experimental results. The physical interpretation of relativistic quantum 
mechanics that one gets by a natural development of the non-relativistic theory 
involves these things and is thus in contradiction with experiment. We therefore 
have to consider ways of modifying or supplementing this interpretation” [34]. 

To delete the divergences Dirac proposed considering the representation 
including positive and negative energies. Then to resolve the problem of 
negative energies he proposed considering operators of emission of photons 
with negative energy as absorption operators of photons with positive energy. 
But this picture contains negative probabilities of absorption of any odd 
number of photons. 

Let A(x) be operators of the quantum electrodynamics of Heisenberg 
and Pauli referring to emission and absorption of photons into positive energy 
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states: 


A} (x) = J J J (Rpet™ + Rye”) ks dk dkodks, (4.3) 


where ko = +4/k? + k2 + k2 and Ry being the emission operator and R, the 
absorption operator. In the same way we introduce the operators Æ (x) re- 
ferring to the negative energy; there is the representation similar to (4.3) but 
with ko = —/k? + k2 + k2. Dirac considered operators Æ = (1/./2)(A! + 
A?) which are expended with respect to operators R, and R, corresponding 
to positive and negative energies. 

The idea was to solve all divergence problems in the symmetric A?(z) 
representation. Then we can obtain some information about the A! (x) rep- 
resentation. But we cannot apply the linear transformation between A?(z) 
and A!(x) representations to the wave function of the Æ (x) representation. 
There would arise the same divergences. But we can do this with the initial 
Gibbs ensemble of A?(x) representation. 

It is convenient to consider with A?(x) additional fields 


B°(z) = (Ae) - AX(2)), 


which commute with A(x), so they are redundant variables. Now let us 
take B equal to the initial value of A. Then for the initial wave function 
p, (B3(z) — A®(x))y = 0 or Rey = 0 with ko either positive or negative. 
Thus any absorption operator applied to the initial wave function gives the 
result zero, which means that the corresponding state is one with no photons 
present. 

The following natural interpretation of the wave function at some later 
time now appears. That part corresponding to m photons of positive energy 
and n photons of negative energy can be interpreted as corresponding to m 
photons having been emitted and n photons having been absorbed. 

Dirac then considered the momentum representation of A(x) and B(x) 
operators. Let k be a momentum-energy vector, k? = 0, and Eku Eku be 


operators of emission and absorption. There ky = +y k? + k2 + k2. Then set 
Cku = kp for ko > 0 and consider the wave function w as y = W(€,¢), ko > 
0. The following commutation relations take place: [£*, £] = c and [¢*,¢] = 
—c, c >Q. 

The variables € correspond to the emission of photons of positive energy 
ko > 0 and the ¢ correspond to the absorption of photons of positive energy 
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ko > 0. Let us denote the space of states Y(£, Ç) by the symbol H. The inner 
product in H has the form : 


(f, g) = 5y fimnGnmm'c™n!\(—c)” 


m,n=0 


for the functions 


SOS Tae: GEO => one 


Now for the wave function y(£, C), normalized by |y? = (Y, y) = 1, the 
probability of there having been m photons emitted into momentum and 
energy state k (corresponding to € ) and n photons absorbed from this state 
is 

P(m,n) = |Ymn|?e"m!(—c)"n!. 


It gives a negative probability for an odd number of photons having been ab- 
sorbed. But this statistical interpretation has no meaning in the framework 
of the ordinary theory of probability. Nevertheless, we can explain the ap- 
pearance of such ‘generalized’ probabilities. On one hand, they may appear 
as a consequence of the violation of the law of large numbers. On the other 
hand, they demonstrate that Dirac’s formalism gives a fine internal structure 
of theory which could not be described by conventional probabilities. 


5 Negative probabilities and localization 


One reason for the difficulties with quantum electrodynamics is the general 
Lorentz condition, according to which the four-divergence of the electromag- 
netic potential A must vanish 


0A? 0A! 0A? 0A _ 
to Ox, Ox O23 


A photon density obtained from this continuity equation suffers from the 
same problems as the Gordon-Klein conserved density (4.1) in that it is not 
positive semidefinite, or, according to the opinion of the respective referee, 
it does not exist. This problem might be related to the fact that photons 
cannot be sharply localized. If they could, we could define the photon density 
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as the number of photons per unit volume in some arbitrary small volume. 
However, in a relativistic field, we cannot define such a density. 

Therefore it seems that there are two ways for the description of reality: 
(1) to assume that physical systems could not be localized with arbitrary 
precision and use Kolmogorov’s axiomatic of probability theory; (2) to as- 
sume that physical systems could be localized with arbitrary precision, but 
to change Kolmogorov’s axiomatic and create probability theories, where 
negative probabilities (as well as probabilities which are larger than 1) are 
mathematically well defined. 

If we follow (1), then we have to deny the ‘continuous’ model of space- 
time based on real numbers. The system of real numbers R describes reality 
with an infinite precision. Here a physical quantity a is represented by the 
real number: 


Qe a 
a=. + —* Sateen + ao +e ++ amt = ay...00, 1... ke , (5.1) 
m m 


where a; = 0,1,...,m — 1, and a natural number m > 1 gives the scale of 
a measurement. All digits in (5.1) can be measured (at least theoretically), 
thus a ‘exists with the infinite precision.’ It would be natural to consider 
other number systems based on expansions which are similar to (5.1) and 
describe reality with a finite precision. Here we could use a system of m- 
adic numbers Qm which is well known in number theory (mainly in the case 
m = p is a prime number). These are quantities of the form 


Q_1 l 
Ga P TAA Pas 


where a; = 0,1,...,m — 1 (thus there is only a finite number of terms cor- 
responding to negative powers of m). A physical formalism based on an 
m-adic ‘finite-precision world’ has been developed in [63], [65] (in fact, such 
a viewpoint is closely connected with the theory of measurements based on 
nonorthogonal operator valued measures [24], [46], [81]). 

If we follow (2), then we can assume that a physical system (in particular, 
photon) can be localized with an arbitrary precision (i.e., we can still use 
the real space in quantum theory). However, we could not assume that we 
should obtain the ordinary (Kolmogorov or Mises) probabilities if we measure 
statistical distributions corresponding to a ‘real localization’. 

Our consideration of precision of measurements of physical quantities and 
negative probabilities can be illustrated by the formalism of quantum theo- 
retical description of radiation. It is given by extending (see the review [90}) 


140 Chapter 3 


the work of Weisskopf and Wigner who calculated the natural linewidth of 
radiative decay of an excited atom. The corresponding transition amplitude 
may be rewritten 
e-t/2 _ eit 
1/2- E 

with E denoting the difference between actual photon energy and mean state 
energy Epo in units of the natural width of the excited state, and t denoting 
the time interval between excitation and decay in units of the mean lifetime 
of the state. 

It is now very interesting to consider the spectral distribution of photons 
emitted in finite time intervals. For the time interval (0,t) we have 


A(E,t) = 


1 — 2e~*/? cos( Et) + e~ 
E? 41/4 


1 
A(E,t)|? = — 
|A(E, 1) P = = 


which undoubtedly is non-negative for every E and t. The spectral distri- 
bution emitted at time t, however, I(E, t) = d|A(E,t)|?/dt, entails negative 
values, as easily can be seen from 


1 (2Esin(Et) + cos(Et))e*/? — e~* 


I(E,t) = 
(E,t) 27 FE? + 1/4 


Further, if the quantity |A(E, t — oo)/? 


katos = 
norm( Ei, t) = Qn E? + 1/4 

is used to normalize I(E, t), we obtain the normalized decay probability den- 
sity p(t) = I(E, t)/lorm( E, t), which can take on negative values as well as 
values exceeding unity, and, if integrated over suited domains AF At, small 
compared to unity (= h), the normalized probability p(t) AE At, which is 
an observable quantity, may violate both the lower and the upper limit of 
Kolmogorov’s axiom. These results have been verified by experiments. 

As it has been pointed out, if the quantities Æ and t are measured with 
extremely high precision, AEFAt < h/2, then it quite natural that there 
appear negative probabilities. 
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p-adic probability theory 


The development of a non-Archimedean (especially, p-adic) mathematical 
physics [108], [107], [41], [55]-[63], [67]-(69], [4] induced some new mathe- 
matical structures over non-Archimedean fields. In particular, probability 
theory with p-adic valued probabilities was developed in [56], [60], [64], [65], 
[72], [73]. This probability theory appeared in connection with a model of 
quantum mechanics with p-adic valued wave functions [57]. The main task 
of this probability formalism was to present the probability interpretation 
for p-adic valued wave functions. 

The first theory with p-adic probabilities was the frequency theory in 
which probabilities were defined as limits of relative frequencies vy = n/N 
in the p-adic topology!.This frequency probability theory was a natural ex- 
tension of the frequency probability theory of R. von Mises [86]-[88]. One of 
the most interesting features of the p-adic frequency theory of probability is 
the possibility to obtain negative (rational) probabilities as limits of relative 
frequencies. Thus negative probabilities which has been considered in Chap- 
ter 3 can be obtained on the mathematical level of rigorousness as p-adic 
probabilities. Typically p-adic frequency negative probabilities (as well as 
probabilities which are larger than 1) appear in the cases of violation of the 
ordinary Mises statistical stabilization (with respect to the real metric). In 
fact, in this Chapter we shall only consider a p-adic generalization of Mises’ 
principle of the statistical stabilization. Thus we shall only study a p-adic 


1The following trivial fact is the cornerstone of this theory: the relative frequencies 
belong to the field of rational numbers Q; we can study their behaviour not only in the 
real topology on Q, but also in some other topologies on Q and, in particular, in the p-adic 
topologies on Q. 
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generalization of the notion of the S-sequence. The next natural step is to 
find a p-adic generalization of Mises’ principle of randomness. This prob- 
lem will be studied in Chapter 6 (on the basis of a p-adic generalization of 
Martin-Lof’s theory of statistical tests). 

The next step was the creation of p-adic probability formalism on the 
basis of a theory of p-adic valued probability measures. It was natural to do 
this by following the fundamental work of A.N. Kolmogorov [74] in which he 
had proposed the measure-theoretical axiomatics of probability theory. Kol- 
mogorov used properties of the frequency (Mises) probability (non-negativity, 
normalization by 1 and additivity) as the basis of his axiomatics. Then he 
added the technical condition of o-additivity for using Lebesgue’s integration 
theory. In works [56], [60] we tried to follow A.N. Kolmogorov. p-adic fre- 
quency probability has also the properties of additivity, it is normalized by 
1 and the set of possible values of this probability is the whole field of p-adic 
numbers Qp. Thus it was natural to define p-adic probability as a Q,-valued 
measure normalized by 1. 

However, it was rather complicated problem to propose a p-adic ana- 
logue of the condition of o-additivity. It is the well known fact that all 
o-additive Q,-valued measures defined on o-rings are discrete measures [97], 
[104]. Therefore the creators of non-Archimedean integration theory (A. 
Monna and T. Springer [89]) did not try to develop abstract measure theory, 
but they proposed an integration formalism via Bourbaki based on integrals 
of continuous functions. This integration theory has been used for creating 
p-adic probability theory in the measure-theoretical framework [60]. The 
main disadvantage of this probability model is the strong connection with 
the topological structure of a sample space. This is quite similar to the old 
probability formalisms of Kolmogorov [75], Frechet [40] and Cramer [23] in 
which the topological structure of the sample space played the important 
role. 

An abstract theory of non-Archimedean measures has been developed by 
A. van Rooji [104]. The basic idea of this approach is to study measures 
defined on rings which in principle cannot be extended to measures on o- 
rings. This gives the possibility for constructing non-discrete p-adic valued 
measures. On the other hand, the condition of continuity for measures in 
[104] implies the o-additivity in all natural cases’. 


?Thus the o-additivity is not a problem. The problem is find the right domain of 
definition of p-adic probabilistic measures. 
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In this Chapter we develop a p-adic probability formalism based on mea- 
sure theory of [104]. By probabilistic reasons we use the special case of this 
measure theory: measures defined on algebras (such measures have some 
special properties). However, probabilistic applications stimulate also the 
development of the general theory of non-Archimedean measures defined on 
rings. We prove the formula of the change of variables for these measures and 
use this formula for developing the formalism of conditional expectations for 
p-adic valued random variables (see also [73]). 

The use of p-adic valued probabilistic measures gives the possibility to 
work on the mathematical level of rigorousness with all signed ‘probabilities’ 
(for example, with Wiegner’s distribution). 

As the fields of p-adic numbers are non-Archimedean there exist infinitely 
large p-adic numbers (in particular, infinitely large natural numbers) in Q,. 
Thus p-adic analysis gives the possibility to use actual infinities and consider 
statistical ensembles with an infinite number of elements. Probabilities with 
respect to such ensembles are defined via the standard proportion (used in 
Chapter 1 for finite ensembles). One of the main features of such ensemble 
probabilities is the appearance of negative (rational) probabilities (as well 
as probabilities which are larger than 1). In this approach the origin of 
such ‘pathological’ (from the real viewpoint) probabilities is very clear. In 
particular, we shall see that a large set of negative probabilities is naturally 
interpreted as a set of infinitely small probabilities (giving the split of the 
conventional probability 0). We shall also see that a large set of probabilities 
which are larger than 1 is naturally interpreted as a set of probabilities which 
are negligibly differ from 1. Other interesting property of p-adic ensemble 
probability is that the corresponding probabilistic measure is not well defined 
on a set algebra. The system of events is only a set semi-algebra. 


1 Non-Archimedean number systems; p-adic 
numbers 


Here we present a brief introduction to non-Archimedean and, in particular, 
p-adic analysis (see, for example, [97], [104], [107], [60], [65]). 

Let F be a ring? (a set where addition, subtraction and multiplication 
are well defined). Recall that a norm is a mapping |: |p : F — R, satisfying 


3By a ring we always mean a commutative ring with identity 1. 
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the following conditions: 


|z|r =0 4 r = 0 and |1|r = 1, (1.1) 
lcy|r < |z|elylF, (1.2) 
|z +ylr < |tlr + lyle. (1.3) 


The ring F with the norm |- |p is called a normed ring. Set |F| = {r € R4 : 
r = |z|r,z € F}. 

The inequality (1.3) is the well known triangle axiom. A norm is said to 
be non-Archimedean if the strong triangle axiom is valid, i.e., 


|z + y|r < max(|z|r, |ylr).- (1.4) 


A ring F with a non-Archimedean norm is said to be a non-Archimedean 
ring. We shall use the following property of a non-Archimedean norm: 


|z +yļr = max(|z|r, lylr), if |ele A lylr- (1.5) 


In order to prove (1.5) we may assume |z|r < |y|r. By (1.4) we find |yļ|p < 
max(|z + y|r,|z|r) < max(|z}r,ly|r). The assumption |z| < |y|r gives 
max(|z|r, |y|r) = |ylr. Hence |y|~ = max(|x + |p, |z|r). From |z|r < |ylr, 
we deduce |y|r = |£ + y|r. This gives (1.5). 

If a norm |- |r has the property: |zy|r = ||rly|r, then it is called a 
valuation (sometimes a norm is called a pseudo-valuation). A ring F with 
the valuation |- |r is called a valued ring. The absolute value |-| = |- |r on 
the field of real numbers R is an example of a valuation. This valuation does 
not satisfy the strong triangle inequality (it satisfies only (1.3)). Valuations 
and norms with such a property are called Archimedean. Another example 
of an Archimedean valuation is the absolute value |- | = | - |c on the field of 
complex numbers C. 

Denote by Z(F) the ring generated in F by its unity element. If F has 
zero characteristic (i.e., 2-1 =1+---+1 #0 for any n = 1,2,....), then 
Z(F) is isomorphic to the ring of integers Z. Therefore in this case we can 
consider Z as a subring of F. In what follows we consider only normed rings 
F which have zero characteristic. 

To illustrate how we can work with the strong triangle inequality we 
present two simple results. 

Proposition 1.1. Let |- |p be a non-Archimedean norm. Then |n|p < 1 
for all elements n € Z. 
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Proof. By the strong triangle inequality (1.4) we have : 


a 

Proposition 1.2. A valuation | - |p is a non-Archimedean valuation if 
and only if |n|p < 1 for all elements n € Z. 

Proof. Let |n|p < 1 for all n = 1,2,... Denote by (7) the binomial 


coefficients, i.e., 
n n! 
= ———, k<n. 
e kn- p SSP 


As these coefficients are integers, I rp < 1 for all n and k. Hence we have: 


n m 1 
e+e 10 (p) 
k=0 


n 
<)> lelelyle* < (n + 1)(max |2|p, Iyle)”, 
k=0 
i.e., 
|z +yļr < lim (1+ n)/" max(|z|r, |ylr) = max(|z|r, yl). 


n 
Let |- |r be a norm on a ring F. Then the function pp(z,y) = |x — y| r 
is a metric on F. It is a translation invariant metric, i.e. pr(z + h,y + h) = 
pr(x,y). As usual in metric spaces we define ‘closed’ and ‘open’ balls in F : 
U,(a) = {x € F : pr(z,a) < r}, U; (a) = {x € F : pp(x,a) < r}, r € Ry. 
We set U, = U, (0). It should be noted that any ball U,(a), r € Ri, coincides 
with some ball U,(a), s € |F|, s < r. In what follows we consider only balls 
U,(a) with r € |F|. The spheres in F are defined by S(a) = {xs € F: 
pr(xz,a) = r},r € Rx. Of course, if r ¢ |F| then S,(a) = 0. Therefore it is 
meaningful to consider only spheres of radius r € |F|. The normed ring F is 
complete if it is a complete metric space with respect to the metric pr. 
Let |- |» be a non-Archimedean norm. Then the corresponding metric pp 
satisfies the strong triangle inequality: 


pr(z,y) < max[pr(z, z), pr(z,y)]- (1.6) 


Such a kind of metric is called an ultrametric. We note that any ‘open’ 
or ‘closed’ ball in an ultrametric space is a simultaneously closed and open 
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subset. Such sets are called ‘clopen’ sets. Spheres in F are also clopen. 
It seems strange from the point of view of our Eucledean intuition. The 
balls U, are additive subgroups of F : if |z|r,|yle < r, then |z + y|r < 
max(|z|-,|y|-] < r. Moreover, the ball U, is a ring: if |z|r,|y|r < 1 then 
Izyle < |z|rlyle < 1. 

We shall continuously use the following simple result. 

Lemma 1.1. (‘The dream of a bad student’) Let F be a complete non- 
Archimedean normed ring. The series X>; Qn, Gn € F converges in F if 
and only if an > 0, n > œ. 

To prove this result we use the Cauchy theorem in complete metric spaces 
(a sequence {Sn} converges iff it is a fundamental sequence, i.e., |Sn —Sm|F > 
0, n,m — oo) and the estimate |) k n41 ak|F < MaXn+1<k<m |ar| F. 

One of the most important non-Archimedean fields, a system of p-adic 
numbers Qp, was constructed by K.Hensel [44]. In fact, it was the first exam- 
ple of a commutative number field (a system where the operations of addition, 
subtraction, multiplication and division are well defined) which was different 
from the fields of real and complex numbers. Practically during 100 years p- 
adic numbers were considered only as objects in pure mathematics. In recent 
years these numbers have been intensively used in theoretical physics see, for 
example, the books [107], [60], [65], [52] and papers [108], [41], [3], [53]-[55], 
[57]-[59], in the theory of probability [56], [65], as well, as in investigations of 
chaos and dynamical systems [70], [65] and applications to cognitive sciences 
and psychology [65], [66], [68], [69]. 

The field of real numbers R is constructed as the completion of the field of 
rational numbers Q with respect to the metric pr(x, y) = |x —y| , where |-| is 
the usual valuation given by the absolute value. The fields of p-adic numbers 
Q, are constructed in a corresponding way, by using other valuations. For 
any prime number the p-adic valuation |- |p is defined in the following way. 
First we define it for natural numbers. Every natural number n can be 
represented as the product of prime numbers : n = X23"... p"?.... Then 
we define |n|, = p ™?, we set in addition |0|, = 0 and | — nl, = |n|p. We 
extend the definition of the p-adic valuation | - |, to all rational numbers by 
setting for m Æ 0 : |n/mlp = |n|p/|m|p. The completion of Q with respect to 
the metric pp(x, y) = |z — yp is the locally compact field of p-adic numbers 
Qp -It is well known (Ostrovsky’s theorem), see [97], that |- | and | - |, are 
the only possible valuations on Q. The p-adic valuation satisfies the strong 
triangle inequality: 

|z + ylp < max||zlp, |ylpl- 
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Thus the field of p-adic numbers Q, is non-Archimedean and the p-adic met- 
ric pp is an ultrametric. Thus any p-adic ball U,(0) is an additive subgroup 
of Q, and the ball U, (0) is also a ring. It is called the ring of p-adic integers 
and denoted by Zp. 

For any x E€ Qp we have a unique canonical expansion (converging in the 
|-|,-norm) of the form 


=Q antes +agp hte, 


where a; = 0,1,...,p — 1, are the ”digits” of the p-adic expansion. The 
elements n € Z, have the expansion: 


n =œ +p: + ap +--+ , (1.7) 
i.e., they can be identified with sequences of digits 
n = (ap,...,%,-.),@; =0,1,...,p — 1. (1.8) 


If n € Z,,n # 0, and canonical expansion (1.7) contains only a finite 
number of nonzero digits aj, than n is natural number (and vice versa). It 
is natural to interpret a number n € Z, such that expansion (1.7) contains 
an infinite number of nonzero digits a; as an infinitely large natural number. 
Thus the ring of p-adic integers contains actual infinities n € Z \ N,n 4 
0. This is one of the most important features of non-Archimedean number 
systems (compare with nonstandard numbers [3]) . In section 3 we introduce 
a partial order structure on Z, which extends the standard order structure on 
N : for n1,n2 EN ny < ne in N iff ni < ng in Zp. Each finite natural number 
is less than any infinite number: n < m for n € N and m € Z\N,m¥ 0. 
This order structure will be used to compare p-adic probabilities. 

If, instead of a prime number p, we start from an arbitrary natural num- 
ber m > 1, we construct the system of the so called m-adic numbers Qm 
(by completing Q with respect to the m-adic metric pm(z, y) = |Z — y|m). 
However, this system is not in general a field. There exist in general divisors 
of zero in Qm, thus Qm is only a ring. Elements of Zm = U,(0) can be iden- 
tified with sequences (1.8) with the digits a, = 0,1,...,m— 1. We can also 
use more complicated number systems corresponding to non-homogeneous 
scales: M = (mj, M2, ..., Mk, .-.), Where m; > 1 are natural numbers. In this 
case we obtain the number system Qm. The elements z € Zy = U,(0) can 
be presented as sequences (1.8) with digits a; = 0,1,...,m,;—1. The structure 
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of Qm is rather complicated from the mathematical point of view. In general 
the number system Qm is not a ring. However, Zm is always a ring. 

Number systems Qm and Qm can be also used to develop new non- 
Kolmogorovean probabilistic models. However, the absence of the well de- 
veloped mathematical formalism does not give such a possibility. 

Let K be a non-Archimedean field with the valuation |- |x. Here the 
function n — 1/|n!|« increases (as |n|x < 1). The following estimate holds 
in the field Q, : 


(npp -2 < 1 < pe-e), (1.9) 
[n!ip 
This estimate is a consequence of the following mathematical fact: 
Lemma 1.2. Let the natural number n be written in the base p 
n = ao + ap +... + amp”, a; =0,1,...,p—1. 
Define the sum of the digits of n by Sn = X j-o@;. Then 
[nt |p = p7 P-D, (1.10) 


Proof. There are [n/p] numbers in {1,2,...,n} that are divisible by p. 
Here, as usual, [a] is the integer part of a. Then there are [n/p] numbers 
that are divisible by p?, etc. . By definition |n!|, = p-%™, where y(n) = 
»j-0l?/p’]. For j € 1, 2,...,m we have 


[n/p] = a; + ajap +... + Omp™ =p Y arp’. 
i=j 
Thus, . 
y(n) = Dei p~” vie} agp = Doin GaP" DS p? 
m i (pt—-1 = m i 
= die oP’ se = (p— 1) UE ul — 1) 


= (p—1)7(n— Sa). 


E 

By Ostrovsky’s theorem the restriction of the valuation |- |x to Q is 

equivalent to one of p-adic valuations: there exists p such that |z|x = Izl, l> 
0, for z € Q. Thus (1.9) implies that 


1 
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where a = a(p,l) > 0 and b = b(p, l) > 0. 

The exponent in K is defined by the standard power series & = yo x" /ni. 
This series converges if |z|x < bt. In particular, in the p-adic case it con- 
verges if |x|, < p/(-?), This is equivalent to |z|, < rp, where rp = 1/p 
for p # 2 and rz = 1/4. Trigonometric functions over the field K are de- 
fined by the standard power series: sing = \(—1)"2?"*1/(2n + 1)! and 
cos £ = >> x?" /(2n)!. These series have the same radius of convergence as the 
series for the exponential function. 


2 Frequency Probability Theory 


Let us provide a generalization of the von Mises frequency theory of probabil- 

ity. Our main idea is very clear and it is based on the following two remarks: 

1) relative frequencies vy = n/N always belong to the field of rational num- 

bers Q; 2) there exist many topologies 7 on Q which are different from the 

usual real topology Tr (corresponding to the real metric pr(z, y) = |x — yl). 
As in ordinary Mises’ theory, we also consider infinite sequences 


t= (x1, wey ON se) Tj E L, (2.1) 


of observations (here L = {%, ..., &x} is a label set). But a new topological 
principle of the statistical stabilization of relative frequencies is proposed: 

the statistical stabilization of relative frequencies vy(a;;z) can 
be considered not only in the real topology on the field of rational 
numbers Q but also in any other topology 7 on Q. 

This topology is said to be the topology of statistical stabilization. Lim- 
iting values P(a;) = Pi(a;) of vy(aijx),i = 1,...,k, are said to be 7- 
probabilities. These probabilities belong to the completion Q, of Q with 
respect to the topology 7. The choice of the topology 7 of statistical stabi- 
lization is connected with the concrete probabilistic model. Sequence (2.1), 
for which the principle of statistical stabilization of relative frequencies for 
the topology 7 is valid, is said to be a (S,7)-sequence (in particular, (S, TR)- 
sequences, where Tp is the real topology, are ordinary (von Mises) S-sequences 
which were considered in Chapter 1). At the moment we do not use any T- 
analogue of the principle of randomness. 

We are mainly interested in the following situation. The real topology 
Tr is not a topology of statistical stabilization for the sequence (2.1), but 
another topology 7 is. In this case we cannot consider (2.1) as a von Mises 
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S-sequence. But there is a new possibility for studying (2.1) as a (S,7)- 
sequence. 

Set Ug = {q E Q: 0 <q < 1}. We denote the closure of the set Ug in 
the completion Q, by Ug,. The following theorem is an evident consequence 
of the topological principle of the statistical stabilization: 

Theorem 2.1. The probabilities P(a;) belong to the set Ug, for an 
arbitrary (S,7)-sequence x. 

As usual, let us consider the algebra Fy, of all subsets of L. As in the 
frequency theory of von Mises we define probabilities P(A) = 37,4 P(ai) 
for A € Fr. By Theorem 2.1 the probability P(A) belongs to the set Ug, for 
every A € Fy. 

Theorem 2.2. Let the completion Q, of Q with respect to the topology 
of statistical stabilization T be an additive topological group. Then for every 
(S,7)-sequence x the probability is an additive function on Fy: P(A U B) = 
P(A)+ P(B), A,B € Fr, ANB=9. 

Here we have used only lim(uy + vv) = limuy + lim vy in an additive 
topological group. 

Theorem 2.3. The probability P(L) = 1 for every topology of the statis- 
tical stabilization T on Q. 

As in Chapter 1 we define a conditional frequency probability P(A/B). 

Theorem 2.4. Let Q, be a multiplicative topological group. Then for ar- 
bitrary A, B € Fr, P(B) #0, the Bayes formula P(A/B) = P(ANB)/P(B) 
holds. 

Here we have used limuy/uy = limuy/limvy if limvy 4 0 in a multi- 
plicative topological group. 

However, we may choose the topology of statistical stabilization 7 such that 
Q, is not an additive group. In this case we obtain non-additive probabilities. 
Further, Q, may be not a topological multiplicative group. In this case we have 
violations of Bayes’ formula for conditional probabilities!. Moreover, there are 
possibilities of different combinations of these properties. For example, there exist 
additive probabilities without Bayes’ formula. 

Now (following to Kolmogorov) we can present an axiomatics correspond- 
ing to the properties of frequency probabilities. Of course, this axiomatics 
depends on the topology r. Thus we have an infinite set of axiomatic theories 
A(r). The simplest case (and the one most similar to the Kolmogorov ax- 
iomatics) is that Q, is a topological field. There, by definition, a 7-probability 


4A simple realization of Accardi’s idea. 
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is a Ug,-valued measure with the normalization condition P(Q) = 1. There 
should be technical restrictions on P to provide a fruitful theory of integra- 
tion (compare with Kolmogorov’s condition of o-additivity). 

We obtain a large class of non-Komogorov probabilistic models if we 
choose a metrizable topology 7 such that the corresponding metric has the 
form p,(z,y) = |z — y|,, where |- |, is a valuation on Q. According to the 
Ostrovsky theorem, every valuation on Q is equivalent to the ordinary real 
absolute value |- |p or one of the p-adic valuations |- |p. Therefore we may 
obtain only two classes of probabilistic models: 1) the ordinary theory of 
probability (with the topology of the statistical stabilization Tp) ; 2) one 
of the p-adic valued probabilistic models (with topologies of the statistical 
stabilization 7,). 

The most interesting property of p-adic probabilities is that Ua, = Qp, 
see [60]. To prove this fact we need only to show that every x € Q can be 
realized as a limit of frequencies vy = n/N, where n, N are natural numbers, 
n < N. Thus any p-adic number z may be a p-adic probability. 

For example, every rational number may be taken as a p-adic probability. 
There are such ‘pathological’ probabilities (from the point of view of the usual 
theory of probability) as P(A) = 2 , P(A) = 100, P(A) = 5/3, P(A) = —1. 
If p = 1,mod 4, then i = /—1 belongs to Qp. Thus ‘complex quantities’ 
can be obtained as frequency probabilities; for example, P(A) = i = /—1 or 
P(A) =1+1%. 

Thus negative (and even complex) probabilities can be realized 
as p-adic frequency probabilities. 

We have presented [60] a large number of statistical models where frequen- 
cies oscillate with respect to the real metric pr and stabilize with respect to 
one of p-adic metrics pp. There p is a parameter of the statistical model. The 
corresponding statistical simulation was carried out on a computer. 

Thus Mises’ principle of the statistical stabilization of frequencies can be 
essentially extended by considering (.S,7)-sequences for topologies T on Q. 
It would be natural to extend second Mises’ principle, namely, the principle 
of randomness and introduce an analogue of Mises’ collective, namely, a 
T-collective. However, I could not obtain any meaningful extension of the 
principle of randomness for p-adic topologies T,. It is still not clear how we 
can define a class of place selections which would not disturb the p-adic 
statistical stabilization. On the other hand, it is well known that in ordinary 
(real) probability theory it is possible to develop the mathematical theory 
of randomness by using Martin-Léf statistical recursive tests [83]—[85]. In 
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Chapter 6 we shall follow to P. Martin-Lof and develop a p-adic theory of 
recursive statistical tests. 


3 Ensemble Probability 


Our interpretation of p-adic numbers 
N=lbthpt:-:+lp?+:-:, (3.1) 


where l = 0,1,...,p — 1, with an infinite number of nonzero digits ns as 
infinitely large numbers gives the possibility of considering numerous actual 
infinities. Therefore we can study ensemble probabilities on ensembles of an 
infinite volume or consider classical probabilities for an infinite number of 
equally possible cases. 

1. Ensembles of infinite volumes. We shall study some ensembles 
S = Sw which have a p-dic ‘volume’ N, where N is the p-adic integer (3.1). 
If N is finite then S is the ordinary finite ensemble, if N is infinite then S 
has essentially p-adic structure. Consider a sequence of ensembles M; having 
volumes I;p’, j = 0,1,... Set 


Then |S| = N. This split of S will play the crucial role in our probabilistic 
considerations. Thus S is not just an arbitrary ensemble of the cardinality 
N. It is an ensemble of the cardinality N constructed via the hierarchical 
structure corresponding to this split. We may imagine an ensemble S as 
being the population of a tower T = Tg, which has an infinite number of floors 
with the following distribution of population through floors: population of 
jth floor is M;. Set Tk = Uh Mj. This is population of the first k + 1 floors. 
Let A C S and let there exist: 


n(A) = jim n(A), where n(A) = JAN Tij. (3.2) 
The quantity n(A) is said to be a p-adic volume of the set A. 


We define the probability of A by the standard proportional relation: 


P(4) = Ps(4) = 4), 


5Of course, we understood that Martin-Löf’s wee does not give the fruitful notion 
of randomness for an individual sequence of trials. 


(3.3) 
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Denote the family of all A C S, for which (3.3) exists, by Gs. The sets A E€ Gg 
are said to be events. Later we shall study some properties of the family of 
events. First we consider the set algebra F' which consists of all finite subsets 
and their complements. 

Proposition 3.1. F C Gs. 

Proof. Let A be a finite set. Then n(A) = |A| and (3.3) has the form: 


p(A) =. (3.4) 


Now let B = A. Then |BNT;,| = |Ti,|—|ANT;,|. Hence there exists limp... |BN 
T| = N — |A|. This equality implies the standard formula: 


P(A) =1— P(A). (3.5) 


E 
In particular, we have : P(S) = 1. 
Proposition 3.2. Let A,, Az € Gs and Aı N Az = @. Then A, U Az E Gg 
and 


Proposition 3.3. Let Aı, A, € Gs. The following conditions are equiva- 


lent: 
1)A; U Ag E Gs; 2)Aı N Áz E Gs; 


3)Aı \ Apo E Gs; 4) Ap \ Aj E Gs. 


There are standard formulas: 


P(A, \ Az) = P(41) — P(A N Ag). (3.8) 


Proof. We have n(A U A2) = nk(A1)+ng( A2) —npk(A1N Ag). Therefore, 
if, for example, A; N Az € Gg then there exists a limit of the right hand side. 
It implies A, U Az € Gg and (3.7) holds. Other implications are proved in 
the same way. a 

Corollary 3.1 The family Gs is a semi-algebra. 

In general A,, Az € Gs does not imply A; U Á> € Gs. To show this, by 
Proposition 3.3 it suffices to find 41, Az € Gs such that A, N Az ¢ Gs It is 
easy to do: let A,, Az € Gg are such that |A; N A N Mı| = 1 for nonempty 
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M, (there is only one element x € A, N Az on each nonempty floor). If N is 
infinite then limk—oo nk(Aı N Az) does not exist. Thus 

Gs is not a set algebra. 

It is closed only with respect to a finite unions of sets which have empty 
intersections. However, Gs is not closed with respect to countable unions of 
such sets: in general (A; € Gs, j = 1,2,..., A4; N A; = 0, i Æ j,) does not 
imply UȘA; € Gs. The natural additional assumptions (A) 072, P(A;) 
converges in Qp or (more strong assumption), (B) 5052, |P(Aj)|p < 00, also 
do not imply A € Gs. 

Example 3.1. Let m = 2, N = -1=14+24+274+---+2"4-.-. 
Suppose that the sets A; have the following structure: |A; N Mgj—1| = 
1, |4; N Mz;-1| = 2%-!—1 and 4N M; = 0, i 4 3(j— 1), 37-1, i.e., the set 
A; is located on two floors of the tower T. In particular, A4; N A; = @, i 4 j. 
As A; € F, then A; € Gs; the probability P(A;) = —2¥-',7 = 1,2... . The 
series ))°°,|P(Aj)l2 < 00. We show that A = UA; ¢ Gs. We have: 


ngg—1)(A) = |4 N Taj—1)| + | UZ] As N Tsg—y| =1+7, 


where |y|2 < 1. Thus |ng(j-1)(A)|2 = 1. But |ngj_1(A)|2 < 1. 
We note the following useful formula for computing probabilities: 


P(A) = S P(AN Mj) 


(probability to find in the tower T an inhabitant Z with the property A is 
equal to the sum of probabilities to find an inhabitant with this property on 
the fixed floor). 

Definition 3.1. The system P = (S, Gs, Ps) is called a p-adic ensemble 
probability space for the ensemble S. 

If N is a finite natural number then we obtain the ensemble probability 
space which was considered in Chapter 1 (with Gs = Fs). In fact, any ensem- 
ble probability space P can be approximated by ensemble probability spaces 
Py having ensembles of finite volumes. Set 


ne = ly + hp +--+ lap" 


for N which has the expansion (3.1). Let lẹ be the first nonzero digit in (3.1). 
Consider finite ensembles Sny, |Sn,| = Nk (k = s,s + 1,...), and ensemble 
probability spaces P,, = (Snes GSnpo Psn) There GS, coincides with the 
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algebra F's, of all subsets of the finite ensemble Sa, and definition (3.3) of 
ensemble probability coincides with the definition of Chapter 1: 


aA. 
Sragl 


We identify Sn, with the population of the first k + 1 floors of the tower Ts. 
Proposition 3.4. Let A € Gs. Then 


Ps(A) = lim Ps,, (AM Sm). (3.10) 


Psn, (A) AE F sny- (3.9) 


To prove (3.10) we have only used that Q, is a topological group. This 
approximation depends essentially on the rule of a measurement, which is 
defined by the sequence {ng} which gives an approximation of the infinite 
ensemble S by finite ensembles {5,,}. In principle the change of this rule 
may change the limiting result (see [60] for the details). 

Proposition 3.5. (The image of ensemble probability). The probability 
P maps Gs into the ball U,,(0), where rg = 1/|N |p. 

To study conditional probabilities we have to extend the notion of the 
p-adic ensemble probability to consider more general ensembles. 

Let S be the population of the tower Tg with an infinite number of floors 
M;, j = 90,1,..., and the following distribution of population: there are m; 
elements on the jth floor, m; € N and the series i mj converges in Zp 
to a nonzero number N = |S|. We define the p-adic ensemble probability of 
a set A C S by (3.2), (3.3); Gs is the corresponding family of events. It is 
easy to check that Propositions 3.1-3.5 hold for this more general ensemble 
probability. 

Let A € Gs and P(A) # 0. We can consider A as a new ensemble with 
the p-adic hierarchical structure A = Uo Maj, where Ma; = AN M;, and 
introduce the corresponding family of events G4. 

Proposition 3.6. (Conditional probability). Let A € Gs, P(A) £0 and 
B € Ga. Then B € Gg and Bayes’ formula 


Ps(B) 
EAB) Ps(A) 


(3.11) 


holds true. 
Proof. The tower T4 of the A has the following population structure: 
there are M4; elements on the jth floor. In particular, Tak = Tk N A. Thus 


nak(B) = |B N Tarl _ |B N Tkl = ng(B) (3.12) 
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for each B C A. Hence the existence of n4(B) = limg_... nak( B) implies the 
existence of ng(B) = limp... ng( B). Moreover, ng(B) = n4(B). Therefore, 


_ na(B) _ na(B)/|5| 
ns(A) — ns(A)/|S] 


P,(B) 


By (3.12) we obtain the following consequence: 
Corollary 3.2. Let A,B € Gs, P(A) £0, and B C A. Then B € G4. 
Thus we obtain 

Ga={BEGs: BC A}. 


Let A, B, ANB € Gs, P(A) # 0. We set by definition P4(B) = P4(ANB). 
Then 
P;(BN A) 
Ps(A) ` 


If we set P4(.B) = P(.B/A) and omit the index S for the probabilities for 
an ensemble S, then we obtain Bayes’ formula. 

Remark 3.1. We have discussed many times the domain of applications 
of Bayes’ formula. This question has the exact and simple mathematical 
answer in the p-adic ensemble probability theory. We can use Bayes’ formula 
for events A and B iff AN B is also the event, i.e., AN B E Gs. 

Remark 3.2. It is important for our physical considerations that Gg 
is not a set algebra and Ps can in principle take any value x € U,,. The 
manipulations which were used to prove Bell’s inequality (Chapter 2) are 
not legal for the ensemble probability space P = (S, Gs, Ps). For instance, 
if there are tree sets By, By, Bo € Gs, then in principle it may be that By N 
Bo, B N Bo, Bo N Be € Gs, but Be N Bo N Bo ¢ Gs. Moreover, probabilities 
can in principle be negative. In this case we cannot use the standard estimate 
for Kolmogorov probabilities. 

2. The rules for working with p-adic probabilities. One of the main 
tools of the ordinary theory of probability is based on the order structure on 
the field of real numbers R. It gives the possibility of comparing probabili- 
ties of different events; events E with probabilities P(E) « 1 are considered 
as negligible and events Æ with probabilities P(E) ~ 1 are considered as 
practically certain. However, the use of these relations in concrete applica- 
tions is essentially based on our (real) probability intuition. What is a large 
probability? What is a small probability? Moreover, it is not easy to com- 
pare two arbitrary probabilities. For instance, do you prefer to win with the 


P4(B) = (3.13) 
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probability P(E) = # or P(E.) = 32. Formally, because P(E,) < P(E.) it 
would be better to choose Fy. But in practice this choice does not give many 
advantages. Thus ordinary probability intuition is based more on centuries 
of human experiment than on exact mathematical theory. 

If we want to work with p-adic probabilities we have to develop some 
kind of a p-adic probability intuition. However, there arises a mathematical 
problem which does not give the possibility of generalizing the real scheme 
directly. This is the absence of an order structure on Q,. Of course, we 
can also do something without an order structure. For example, we can 
classify (split) different events with the aid of their p-adic probabilities. For 
instance, it works sufficiently successful in the frequency probability theory. 
If there are two sequences x and y (generated by some statistical experiment) 
which are not S-sequences in the ordinary von Mises’ frequency theory, then 
we could not split properties of x and y. Both these sequences seem to be 
totally chaotic from the real point of view. However, if they are (S,7)- 
sequences, then it would be possible to classify them with the aid of p-adic 
probability distributions, P,(a;), P,(a;:). In the ensemble approach different 
p-adic probabilities, Ps(E£) # Ps(£2), mean that the events E; and Ez have 
different p-adic volumes. 

However, we could do much more with p-adic probabilities by using the 
partial order structure which exists on the ring of p-adic integers. 

(O) Let £ = %2}1...%y... and Y = YoY1..-Yn... be the canonical expansions 
of two p-adic integers x,y € Zp. We set x < y if there exists n such that 
Ln < Yn and x, < yx for all k >n. 

This partial order structure on Z, is the natural extension of the standard 
order structure on the set of natural numbers N. It is easy to see that x < y 
for any x € N and y € Z,\N, i.e., any finite natural number is less that any 
infinite number. But we could not compare any two infinite numbers. 

Example 3.2. Let p = 2 and let x = —1/3 = 10101....1010..., z = 
—2/3 = 0101....0101... and y = —16 = 0001...1111.... Then z < y and z < y, 
but the numbers z and z are incompatible. 

It is important to remark that there exists the maximal number Nmaz € 
Zp. It is easy to see: 


Niaz = —1 = (p—1)+(p—-1)pt+---+(p—-ljp*t+---. 


`- Therefore the ensemble S_; is the largest ensemble which can be consid- 
ered in the p-adic framework. 
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Remark 3.3. It seems to be natural to suppose that the volume of the 
ensemble increases with the increase of p, i.e., |£] < |S%,|,p < q. 
Proposition 3.7. Let N € Z,,N #0. Then Sn € Gs_, and 


S 
PsalSs)= sh] 


Corollary 3.3. Let N € Zp, N #0. Then Gs, C Gs_, and probabilities 
Ps (A) are calculated as conditional probabilities with respect to the sub- 
ensemble Sy of ensemble S_; : 


—N. (3.14) 


Ps, (A) 
Ps (Sw) 


But A € Gs _, does not imply AN Sw €E Osy- 

By Corollary 3.3 we can, in fact, restrict our considerations to the case 
of the maximal ensemble S_,. Therefore we shall study this case S = S_. 

The (partial) order O on the set of p-adic integers Z, gives the possibility 
to compare p-adic volumes n(A) of sets A € Gg. It is natural to say that 
probability P(B) is larger than probability P(A) if the p-adic volume n(B) 
of B is larger than the p-adic volume n(A) of A. Thus we obtain the following 
(partial) order on the set of probabilities: 

(O) P(B) > P(A) iff n(B) > n(A). 

We use th same symbols >, < for this new order on Zp. We hope that the 
reader would not mix these two orders on Z, : Q-order is used to compare 
p-adic volumes, O-order is used to compare probabilities. For example, let 
p = 2 and let n(B) = —2(= 011...1...),n(A) = —3(= 1011...1...). Then 
n(B) > n(A) (with respect to O) and consequently P(B) = 2 > P(A) = 3 
(with respect to O) 

We study some properties of probabilities. 

(1) As we have only a partial order structure we cannot compare proba- 
bilities of arbitrary two events A and B. 

(2) As x < —1 with respect to O for any z € Z,, we have P(A) < 1 = 
P(S) for any A € Gs. 

(3) As z > 0 with respect to O for any x € Z, we have P(A) > 0 for 
any A E€ Gg. 

To illustrate further properties of p-adic probabilities, we shall use the 
third order structure, namely, the usual real order structure on the set Z,NQ. 
In this case we shall say r-increase or r-decrease. This r-order on Z% N Q has 


Psy (A) = Ps_, (A/Sn) = AE Gsn- (3.15) 
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no probabilistic meaning. We consider this order, because we want to use the 
‘real intuition’ to imagine the location of rational probabilities P(A), A € Gs, 
on the real line. We shall use the symbols [a,b], ..., (a, b) for corresponding 
intervals of the real line. For example, let p = 2 and let P(B) = 2 and 
P(A) = 3. Then P(B) > P(A), but from the viewpoint of the r-order P(B) 
is less than P(A). 

(4) Set Ff = {A € Gs : n(A) e N}. ê 

The restriction of the order O on the set of natural numbers N coincides 
with the standard (real) order on N. Thus n(A) < n(B), A,B e F, iff 
the natural number n(A) is less than the natural number n(B). This implies 
(by definition of the order Õ on the set of probabilities) that P : Ff — 
(—oo,0) MZ and P(A) is increasing if P(A) is r-decreasing. Therefore, for 
example, probabilities P(A) = —1 or —3 are rather small with respect to 
probabilities P(B) = —100 or —300. 

(5) Set Ff ={B=A:A€ F} (in particular, F/ contains complements 
of all finite subsets of 2). Then P : Ff — N and P(B) is decreasing if P(B) 
is r-increasing. Therefore, for example, probabilities P(E) = 100 or 200 are 
rather small with respect to probabilities P(C) = 1 or 2. 

We can use these rules for conditional probabilities. For example, let 
P(B) = 100, P(B’) = 200, P(A) = 2 and B, B’ C A. Then P(B/A) = 50 > 
P(B’/A) = 100. 

By (4) and (5) we can work with probabilities belonging to Ff U F®. 

(6) Now consider events A ¢ Ff U FI. We can develop our intuition only 
by examples. 

Example 3.3. Let p = 2. Let |AN Ma;| = 27* and AN M+ = Í, k = 
0,1,.... Then n(A) = —1/3(= 1010...10...) and P(A) = 1/3. Let BC A 
and BN My = AN Mx, BN M; = 0, j # 4k. Then n(B) = -1/15(= 
100010001...10001...) and P(B) = 1/15. It is evident that —1/15 < —1/3 in 
Z2. Hence P(B) = 1/15 < P(A) = 1/3. 

Thus it seems to be that the probabilistic order relation on the set [0, 1]NQ 
coincides with the standard real order. Moreover, it seems to be reasonable 
to use this relation also in the case where the numbers n(A) and n(B) are 
incompatible in Z; 7. 


6In particular, Ff contains all finite subsets of S. The Ff contains also some infinite 
subsets A € Gg which have finite p-adic volumes. For example, let |AN Th] =1+p*,k = 
1,2,... (1 +p" inhabitants of the first (k + 1) floors have the property A). Then n(A) =1 
and hence A € FY, 

7However, probably it is the wrong extrapolation and we must assume existence of 
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Example 3.4. Let p and A be the same as above. Let |C N Mox41| = 
22k+1 CN My, = 0, k = 0,1,... . Then n(C) = —2/3 and P(C) = 2/3. 
The numbers n(A) = —1/3 and n(C) = —2/3 are incompatible in Z2. But 
heuristically it seems to be evident that we can use the r-order structure 
on [0,1] to compare the probabilities of the events A and C. Therefore the 
probability of w € C is two times larger than the probability w € A. These 
heuristic reasons were also confirmed by some frequency statistical models, 
see [60] for the details. 

Further we have that a probability z € (—oo, 0)NZ is practically negligible 
with respect to any probability y € (0, 1] N Q. The intuitive argument is the 
following. A probability P(A) € (—0o,0) N Z is probability of an event A 
with a finite p-adic volume in the infinitely large ensemble S. Probability 
P(A) € (0, 1] Q is probability of an event A with an infinite p-adic volume 
in the infinitely large ensemble S. 

Therefore, p-adics gives the possibility to split probability 0 to a set of 
probabilities, 0 + Dg; in particular, (—00,0) N Z c Dg. 

Remark 3.4. A probability P on a Boolean algebra A is non-degenerated: 
P(A) =0,A € A iff A = 9. The p-adic split of probability 0 can be considered as 
a step in the direction to Boolean probabilities. The set of new labels Df gives 
the possibility to split many probabilities which must be equal to probability 0 
from the viewpoint of real analysis. However, we still have not obtained a Boolean 
probability. There are numerous events A € Gs, A Æ 0, which have probability 0. 
For example, let |A N Tk] = pë, k = 1,2,.... Then P(A) =0. 

We can also use these rules for conditional probabilities. For example, 
let P(B) = 1/15 < P(B’) = 2/15, P(A) = 1/5 and B, B’ C A. Then 
P(B/A) = 1/3 < P(B’/A) = 2/3. Moreover, for example, let P(B) = —1 < 
P(B’) = —5, P(A) = —100 and B,B’ C A. Then P(B/A) = 1/100 < 
P(B’/A) = 1/20. Thus the r-order structure on (0, 1] N Q reproduces the 
rule (4). 

Proposition 3.8. If P(B) € N, then n(B) € {0} UN; if P(B) € 
(0,1) NQ then n(B) € Z, \ N. 

Proof. If k = P(B) € N, then n(B) = —k,k = 1,2,... , and n(B) = 
—1 + k. If a = P(B) € (0,1) N Q then n(B) = —a and n(B) =a —1 g N. m 

Thus if P(B) € N, then the set B has a finite p-adic volume, n(B). On 
the other hand, if P(B) € (0,1) N Q, then the set B has an infinite p-adic 
volume, n(B). It is natural to assume that probability P(B) € N is larger 


events with incompatible probabilities. 
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than any probability P(C) € (0,1) N Q. 

Therefore, p-adics gives the possibility to split probability 1 to a set of 
probabilities, 1 —> Dī. In particular, N C DI . However, the probability 1 is 
still not totally split. There are numerous events A Æ @ with P(A) = 1. For 
example, let |AM.M,| = pl*+)/2] —1,k = 1,2, ... (here [z] denotes the integer 
part of x). Then n(A) = —1 and P(A) = 1. But A # 90. 

We can also split all probabilities z = P(A) € (0,1)NQ. 

Let A € Gs, £ = P(A) € (0,1)NQ, C € Ff, ANC = 9, and let B = AUC. 
Then A = P(B) = P(A) + P(C) = z — k, where P(C) = —k,k € N. As the 
p-adic volume of the set C is finite (and the ensemble S is infinite) probability 
P(C) = —k is infinitely small. Thus the probability x can be split in a set of 
probabilities Dł. Each probability A € Df is larger than probability x and 
probability A = \ — x = —k is infinitely small. 

Let B € Gs,C € Ff, BNC = 0, and let A = BUC, x = P(A) € (0,1)NQ. 
Then A = P(B) = P(A) — P(C) = z + k, where P(C) = —k,k € N, is 
infinitely small probability. Thus the probability x can be split in a set of 
probabilities D>. Each probability A € D} is less than probability x and 
probability A = z — \ = —k is infinitely small. 

Thus probability x is split in a set of probabilities D, = D} U Dt. 

We now consider probabilities with respect to an ensemble Sy for an 
arbitrary N € Zp, N # 0. By using formula (3.15) we can translate to the 
general case results obtained for the ensemble S = Sı. In the general case 
probability 0 is split in a set D which contains the set {A = a :k EN}; 
probability 1 is split in a set D] which contains the set {A = 1 — 5 :kEN}; 
probability x € (0,1) N Q is split in a set D, = D7 U Dł, where D}, in 
particular, contains the set {A = x — E : k € N} and D}, in particular, 
contains the set {A = «+4 : k € N}. 

3. Negative probabilities and p-adic ensemble probabilities. Let 
us consider Example 2.2 of Chapter 3 from the p-adic viewpoint. The series 
|S| = 1+2+...+2*+... = —1 converges in Q2. Thus the statistical ensemble § 
of Example 2.2 has the 2-adic maximal volume -1. Probabilities p, = |S(A = 
Ax)|/|S| = —2* are infinitely small probabilities. p-adic approach implies that 
the distribution of quantum systems regarding to values A = A; of hidden 
variables has the 2-adic hierarchical structure. The ensemble S has the form 
of a tower in that the jth floor is ‘populated’ by quantum systems s with 
the property À = A;. If we assume that a preparation procedure € produces 
portions of quantum systems in the accordance to this tower structure, then 
there will be extremely unstable behaviour of properties A = A; in quantum 
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data which will be used in an experiment (compare with [67]). 
The summation in the formula of total probability 


P(A) =S Y Py (3.16) 


k=0 jej(A) 


is meaningful from 2-adic viewpoint for conditional probabilities P; which 
do not depend on k (for finite sets A). 

We now consider Example 2.3 of Chapter 3. Here conditional probabilities 
Px: = —2! are well defined in Q2. These are infinitely small probabilities. The 


summation in (3.16) is meaningful. For example, for A = {Xj, ..., Xoq, ---} We 
have 
P(A) = —2! —277) =, 
s(A) o> 1, = 


wl bo 


Ps(A) = $ -20-2 = 
1=0 j=0 
All above series converge in Qo. 
Finally we consider Example 2.4 of Chapter 3. By equality (1.10) the 
factorial series $ `z—o k! converges in each field Q,. Thus conditional proba- 
bilities 1 


Pki = aa 
Le=o k! 


are well defined in each Q,. 


4 Measures 


Let X be an arbitrary set and let R be a ring of subsets of X. The pair 
(X, R) is called a measurable space. The ring R is said to be separating if 
for every two distinct elements, x and y, of X there exists an A € R such 
that z E€ A,y Z A. We shall consider measurable spaces only over separating 
rings which cover X. 

Every ring R can be used as a base for the zero-dimensional topology 
which we shall call the R-topology. This topology is Hausdorff iff R is 
separating. 


8A topological space (X; 7) is zero dimensional if each point x € X has a basis of clopen 
(i.e., at the same time open and closed) neighborhoods. 
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Throughout this section, R is a separating covering ring of a set X. 

A subcollection S of R is said to be shrinking if the intersection of any 
two elements of S contains an element of S. If S is shrinking, and if f is a 
map R — K or R — R, we say that limacs f(A) = 0 if for every € > 0, 
there exists an Ao € S such that |f(A)| < € for all A € S, AC Ap. 

Let K be a non-Archimedean field with the valuation |- |x. 

A measure on R is a map u: R — K with the properties: 

(i) u is additive; 

(ii) for all AE R, |All, =sup{|u(B)|x : B E R, BC A} < œ; 

(iii) if S C R is shrinking and has empty intersection, then limjes (A) = 
0. 


We call these conditions respectively additivity, bounded, continuity. The 
latter condition is equivalent to the following: limyes || A||, = 0 for every 
shrinking collection S with empty intersection. 

Condition (iii) is the replacement for o-additivity. Clearly (iii) implies ø- 
additivity. Moreover, we shall see that for the most interesting cases (iii) is equiv- 
alent to o-additivity. Of course, we could in principle restrict our attention to 
these cases and use the standard condition of o-additivity. However, in that case 
we should use some topological restriction on the space X. This implies that we 
must consider some topological structure on a p-adic probability space. We do 
not like to do this. We shall develop the theory of p-adic probability measures 
in the same way as A.N. Kolmogorov(1933) developed the theory of real valued 
probability measures by starting with an arbitrary set algebra. 

Further, we shall briefly discuss the main properties of measures, see [104] 
for the details. As in Chapter 1, for any set D, we denote its characteristic 
function by the symbol Ip. For f : X — K and ¢: X — (0,00), put 


fle = sup |f (z)|xo(z). 


We set 
N,(£)= inf — (lV ll. 


VER,zEeU 


for x € X. Then |All, = ||Zallv, for any A € R. We set ||f|l, = IF Iln, 
A step function (or R-step function) is a function f : X — K of the form 
{a= ei Cela, (x) where ck E K and Ap E R, Ak N A; = Q, k # L. We set 


for such a function 
fro f(x perc 


k=1 
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Denote the space of all step functions by the symbol S(X). The integral f > 
Jx f(z) (de) is the linear functional on S(X) which satisfies the inequality 


| f f(a)u(de)|ac < Ifl- (4.1) 


A function f : X — K is called -integrable if there exists a sequence of 
step functions {fn} such that lim, 00 || f — fall, = 0. The u-integrable func- 
tions form a vector space L,(X,) (and S(X) c L,(X, )). The integral is 
extended from S(X) on L,(X,) by continuity. The inequality (4.1) holds 
for f € L (X, p). 

Let R, = {A: A C X, Ia € Li(X,n)}. This is a ring. Elements of 
this ring are called -measurable sets. By setting u(A) = fy Ia(£)u(dx) the 
measure u is extended to a measure on R,. This is the maximal extension 
of u, i.e., if we repeat the previous procedure starting with the ring R,,, we 
will obtain this ring again. 

Set Xe = {x € X : N (x) > €}, Xo = {x € X : N(x) = 0}, X4 = X\ Xo. 
Every A C Xo belongs to Rp. We call such sets p-negligible. 

Now we construct product measures. Let 44,7 = 1,2,...,n, be measures 
on (separating) rings R; of subsets of sets X;. The finite unions of the sets 
A,X:++X An, Aj € Rj, form a (separating) ring Ry X- + -X Ry of X1 X- XXn. 
Then there exists a unique measure jy X ++- X Un ON Ry X +--+: X Rn such that 
Hi X++ X pn(Ay x +++ X An) = (Ai) X- X Un(An). We have 


Nur x--xtin (B19 0+) Bn) = Ny, (21) X +++ X Ny (En). 


Let X be a zero-dimensional topological space®. We denote the ring of 
clopen (i.e., at the same time open and closed) subsets of X by the symbol 
B(X) (in fact, this is an algebra). We denote the space of continuous bounded 
functions f : X — K by the symbol G(X). We use the norm ||fllẹlo = 
supzcx |f (z)|x on this space. 

First we remark that if X is compact and R = B(X) then the condition 
(iii) in the definition of a measure is redundant. If X is not compact then 
there exist bounded additive set functions which are not continuous. 

Let X be zero-dimensional N-compact topological space, i.e., there exists 
a set S such that X is homeomorphic to a closed subset of N°. We remark 
that every product of N-compact spaces is N-compact; every closed sub- 
space of an N-compact space is N-compact. Then every bounded o-additive 


We consider only Hausdorff spaces. 
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function u : B(X) — K is a measure. On the other hand, if X is a zero- 
dimensional space such that every bounded o-additive function B(X) > K 
is a measure, then X is N-compact. 

In the theory of integration a crucial role is played by the R,,-topology, 
i.e., the (zero-dimensional) topology that has R, as a base. Of course, Ry- 
topology is stronger that R-topology. Every p-negligible set is R,-clopen. 
The following two theorems [104] will be important for our considerations. 

Theorem 4.1. (i) If p is a measure on R, then N, is R-upper semicon- 
tinuous, (hence, R,y,-upper semicontinuous) and for every A E R, ande > 0 
the set A, = AN X, is Ry-compact. 

(ii) Conversely, let uw: R — K be additive. Assume that there exists an 
R-upper semicontinuos ¢ : X — [0,00) such that |u(A)|« < sup,c,4 (Zz), A € 
R, and{xz € A: (x) > e} is R-compact (A € R,€ > 0). Then u is a measure 
and N, < ¢. 

Theorem 4.2. Let 4: R— K be a measure. A function f : X — K is 
-integrable iff it has the following two properties: (1) f is R,-continuous; 
(2) for every e > 0, the set {x : |f(x)|kNu(x) > €} is Ry-compact. 

We shall also use the following fact. 

Theorem 4.3. Let f € Lı(X, ps) and let 


[ Fema) =0 for every AER. (4.2) 


Then supp f C Xo. 

Proof. Let us assume that f satisfies (4.2) and there exists a E€ X4 
(hence N,(z0) = a > 0) such that |f(zo)|x = c > 0. Let {fp} be a sequence 
of R-step functions which approximates f. For every e > 0 there exist N; 
such that ||f — fill, < ae for all k > N.. In particular, this implies that 
|fk(£o)|k Z c— e€, k > Ne. Then we have 


Asa =| f flaue =| | Kaudo- f f(@)u(da)ix < as, BER. 
Let 


fi(x) = X cag lay; (2), Crj E K, Bij € R, Ba N Bri = 0,4 F j, 
j 


and let zo € Bij. If B C Bko, B E R, then Ag, = |cy|xle(B)|k = 
|fk(zo)|x|u(B)|g < ae. On the other hand, as ||Bkjol|u > a, then for every 
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ô > 0, there exists B C B,;,,B E R, such that |u(B)|x > (a — ô). Thus we 
obtain for this B: Ag, > (a — 4)(c— e). By choosing € > 0,46 > 0, such that 
(a — 5)(c — €) > ae arrive to the contradiction. E 

We shall use the following simple fact. 

Lemma 4.1. Let (X;,R;),j = 1,2, be measurable spaces and let f : 
Xı — X, be measurable. If S is shrinking in Ry then f-1(S) is shrinking in 
Ri. If S has empty intersection, then f-1(S) has also empty intersection. 

Lemma 4.2. Let (Xj, Rj), j = 1,2, be measurable spaces and let n : 
Xı — Xə be a measurable function. Then, for every measure u : Rı > K, 
the function py : Ro — K defined by the equality u)(A) = u(n-'(A)) is 
a measure on Re and, for every Ry-continuous function, h : X> — K the 
following inequality holds: 


lhl < IIA © ally (4.3) 
Proof. We have for every A € Ry, 
lAl = sup{lu(n*(B)) : B € R2, B C A} < |m (Alla < 00. (4.4) 


Thus 4} is bounded. We now prove that jz, is continuous on Rz. Let S be 
shrinking in Rz which has the empty intersection. By Lemma 4.1 7~1(S) is 
shrinking in Rı which has also the empty intersection. By (4.4) we obtain 
that limyes || Alla, = 0. 

We prove inequality (4.3). Let h : Xa — K be Re-continuous. We wish 
to prove that |(b)| kNm (b) < ||ho n||, for all b € X2. So we choose b € Xz 
with h(b) # 0. Then the set C, = {y € Xe: |h(y)|x = |A(b)|x} is Re-open. 
Hence there is a B € Ry with b € B C Cy. Then 

[A(b) lr Nun (b) < AOB < AOB) = 
sup |A(b)kNa(z)< sup |(hon)(a)|KNu(2) < |h o nlla- 
zen '(B) zen! (B) 
E 

The following theorem on the change of variables will be important in 
our probabilistic considerations. 

Theorem 4.4. (Khrennikov — van Rooij) Let (Xj, Ri), j = 1,2, be 
measurable spaces and let n : Xı — Xə be a measurable function, and let 


u: Ry — K be a measure. If f : Xa > K is an Rz-continuous function such 
that the function f on belongs to L\(X1, p), then f € Li(X2, Hn) and 


i. tna) (de) = f f(u) (dy). 
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Proof. It suffices to prove that for every « > 0 there exists a 7p-step 
function g such that ||f — ||, < € and ||fon—gonll, < €. By (4.3) the first 
follows from the second. So we fix € > 0. 

By Theorem 4.2 the set 


A= {a € X1 : |(fon)(@)|kNyu(x) 2 €} 


is Ry-compact and therefore contained in an element of R,. But N, is 
bounded on every element of Rı, so N, is bounded on A. We choose ô > 0 
so that 

ON, (x) < € for all x € A. 


As A is compact, f(7(A)) is also compact. We can cover f(n(A)) by disjoint 
closed balls of radius 6: f(n(A)) C Us(ao) U... U Us(an), where ag is chosen 
to be 0 in order to obtain: 


lanlx < |tlx for t € Uslan), n = 0,1,..., N. (4.5) 


For each n, Cn = {C € Ro: C C fH(Us(an))} is a collection of open sets 
covering the compact set n(A) N f-'(Us(an)). Thus, for each n there is a 
Cn € Cy such that n(A) N f~!(Us(an)) C Cp. We now have 


Co, Lees Cn (S Ro, 


Cn C f-'(Us(an)),n = 0,1,..., N, 


Put g(r) = J2 o anlo, (z). Then g is a Ry-step function. We wish to 
show that, for alla € X, 


A(a) = |(fon)(@) — (9° n)(@)lxNy(@) < €. 


Thus, take a € X : 

(1) If a € A, then there is a unique n with n(a) E€ Cp. Then A(a) = 
I 0n)(a) — anlicN,(a) < 6N,(a) < € 

(2) If a ¢ A, but n(a) E€ Cn for some n, then by (4.5) we obtain that 
A(a) = |(F o n)(a) — anlk Nla) < (F 0 n)(a)le Nyla) < €. 

(3) Ifa ¢ CoU...UCy, then g(n(a)) = 0. Thus A (a) = |(fon)(a)lkN, (a) < 
e (as a ¢ A). = 

Open problem. To find a condition for functions f which is weaker 
than continuity, but implies the formula of the change of variables. 
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Further we shall obtain some properties of measures which are specific 
for measures defined on algebras. 

Throughout this section, A is a separating algebra of a set X. First we 
remark that if we start with a measure u defined on the algebra A then the 
system A, of -integrable sets is again an algebra. 

Proposition 4.1. Let u: A— K be a measure. Then for each € > 0, 
the set Xe is A,-compact. 

This fact is a consequence of Theorem 4.1. 

Proposition 4.2. Let u: A— K be a measure. Then the algebra B(X) 
of A,-clopen sets coincides with the algebra A,. 

Proof. We use Theorem 4.2 and the previous proposition. Let B € 
B(X). Then Ig is A,-continuous and {z : |[Ip(x)|«N,(x) > ce} = BAX.. As 
B is closed and X; is compact, B N Xe is compact. Thus B(X) C Ap. u 

As a consequence of Proposition 4.2, we obtain that (X) C Li(X, p) 
(for the space X endowed with A,-topology) and the following inequality 
holds: 


| fro f()ulde)le < [flloolXlla f E€ OX). 


Let X be zero dimensional topological space. A measure js defined on the 
algebra B(X) of the clopen sets is called a tight measure. Thus by Proposition 
4.2 every measure y : A — K is extended to a tight measure on the space X 
endowed with the A„-topology. 

Proposition 4.3. Let u : A — K be a measure and let f € Li(X, p). 
Then f is (A,, B(K))-measurable. 

Proof. By Theorem 4.2 f is A,-continuous. Thus f~'(B(K)) c B(X). 
But by Proposition 4.2 we have that A, = B(X). m 


5 p-adic probability space 


Let u : A — Qp be a measure defined on a separating algebra A of subsets 
of the set Q which satisfies the normalization condition (Q) = 1. We set 
F = A, and denote the extension of p on F by the symbol P. A triple 
(Q,F,P) is said to be a p-adic probability space (Q is a sample space, F is 
an algebra of events, P is a probability). 

As in general measure theory we set 


Q = {w EN: Np(w) > a}, a > 0,04 = UgsoM%a, NM = Q \ Q4. 
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If a property E is valid on the subset 0 we say that = is valid a.e. (mod P). 

Everywhere below (G, I’) denotes a measurable space over the algebra T. 
Functions € : Q — G which are (F,1)-measurable are said to be random 
variables. 

Everywhere below Y is a zero dimensional topological space. We consider 
Y as the measurable space over the algebra B(Y). Every random variable 
E: Q — Y is continuous in the F-topology. In particular, Q,-valued random 
variables are (F, B(Q,))-measurable functions. If € € £,(Q, P), we introduce 
an expectation of this random variable by setting EE = fo €(w)P(dw). We 
note that every bounded random variable £ : Q — Qp belongs to L,(Q, P). 

Let 7 : Q — G be a random variable. The measure P, is said to be a 
distribution of the random variable. By Theorem 4.4 we have that 


Ef(n) = f FUP, (dy) (5.1) 


for every T-continuous function f : G — Qp such that fon € 1,(Q,P). In 
particular, we have the following result. 

Proposition 5.1. Letn: Q — Y be a random variable and let f € O (Y). 
Then the formula (5.1) holds. 

We shall also use the following technical result. 

Proposition 5.2. Letn : Q — Y be a random variable and let ¢ € 
Lı (Q, P), and let f E€ (Y). Then E(w) = C(w)f(n(w)) belongs Ly(Q, P) 
and 


Eg = tf(y)P.(drdy), 2(w) = (¢(w),n(w)). 
QpxY 
Proof. We have only to show that € € Z,(Q,P). This fact is a conse- 
quence of Theorem 4.2. a 


The random variables €,7 : Q — G are called independent if 
P(E € A,n € B) = P(E € A)P(n € B) for all A,B er. (5.2) 


Proposition 5.3. Let, n: Q — Y be independent random variables and 
functions f,g E€ (Y). Then we have: 


Ef ()9(7) = Ef(€)Eg(n). (5.3) 


` Proof. If f and g are locally constant functions then (5.3) is a conse- 
quence of (5.2). Arbitrary functions f,g € © (Y) can be approximated by 
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locally constant functions (with the convergence of corresponding integrals) 
by using the technique developed in the proof of Theorem 4.4. a 
Remark 5.1. In fact, the formula (5.3) is valid for the continuous f,g 
such that the random variables f(€), g(n) and f(€)g(n) belong Lı (Q, P). 
Proposition 5.4. Let € and ņ be independent random variables. Then 
the random vector z = (€,7) has the probability distribution P, = P, x Pe. 
This fact is the direct consequence of (5.2). 
Let € and 7 be respectively Q, and G valued random variables and € € 
[,(Q,P). A conditional expectation E[€|n = y] is defined as a function m € 
[1(G,P,,) such that 


f Elw)P (dw) = 1 m(y)P,,(dy) for every B Er. 
{wEQin(w)EB} B 


Proposition 5.5. The conditional expectation is defined uniquely a.e. 
mod P,. 

Proof. We assume that there exist two conditional expectations mj € 
L,(G,P,) and m;(x9) # m2(xo) at some point zo and Np, (zo) > 0. Set 
m(x) = m(x) — m(x). We have : f,.m(x)P,(dx) = 0 for every B €T. To 
obtain the contradiction, it is sufficient to use Theorem 4.3. a 

As there is no analogue of the Radon-Nikodym theorem in the non- 
Archimedean case [104], it may happens that a conditional expectation does 
not exist. Everywhere below we assume that m(y) = E[E|n = y] is well 
defined and moreover, that it belongs to the class G,(Y). 

Proposition 5.6. Let €: Q — Q,,7:Q—Y be random variables, and 
E € £,(0,P). The equality 


Ef(n)é = Ef(n(w))E(E(w)|n = nw) 
holds for every function f € C,(Y). 
Proof. By Proposition 5.2 we obtain E€f(n) = Japxy ® f(y)P.(dady), 
where z(w) = (€(w), n(w)). Set for A € B(Y), 


MA) = J _, tlaluyP (dey). 


As X(A) = fya) §(w)P(dw) = fy m(y)P, (dy), it is a tight measure on Y. 
Then 


f 2P ded = f FONA) = f f(y)m(y)P,(dy) = Ef(n)m(n). 
QpxY Y Y 


Chapter 5 


Information and Probability 


The title of this chapter may induce the impression that we would discuss 
the well known connection between probability and information based on 
entropy. However, we shall discuss ideas which extremely differ from Shan- 
non’s ideas that by specifying probabilities of various states it is possible to 
obtain a quantative measure of information. In the opposite to Shannon’s 
approach we shall demonstrate that, in fact, information induces probability. 
In particular, we reduce subjective probability to the ensemble probability 
(proportion) on the space of human ideas. This viewpoint clarifies the use 
subjective probabilities and implies some consequences for cognitive sciences 
(in particular, for psychology). 


1 Information reality 


Our information-probabilistic considerations are based on a new model of 
reality. In our model physical reality is reality of information. Reality of 
17-20 th centuries, Newton’s reality (or material reality), is just a part of 
general information reality. Newton’s model of reality gives the description 
of material systems and motions of such systems (as well as fields associated 
with material systems). From our point of view material systems give only a 
particular class of information systems, transformers of information. Besides 
material transformers of information, there exist purely information systems 
and processes. In particular, the phenomenon of conscious could not be 
understand on the basis of Newton’s model of reality. We propose a model 
of purely information reality and develop analogues of classical (Newton’s) 
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mechanics on information spaces (in particular, spaces of ideas for cognitive 
systems). 

The mathematical basis of Newton’s classical mechanics (as well as Ein- 
stein’s relativity) is given by the system of real numbers R. ! This system 
describes well dynamics of material systems. However, it is not so useful 
for studying of information processes and, in particular, cognitive processes. 
In fact, the absence of adequate mathematical models for cognitive infor- 
mation processes is a consequence of the use of real analysis. It seems that 
real analysis could not provide the adequate description of cognitive pro- 
cesses. By studying configurations of exited neurons in a domain Dg (called 
a brain) of real space R we would never understand cognitive information 
processes and, in particular, the phenomenon of consciousness. We have to 
change a number system and use non-real number systems which provide 
more adequate description of cognitive information processes. In fact, since 
the creation of quantum mechanics, it was clear that the mathematical ba- 
sis of physics must be changed. However, Newton’s mathematical model of 
space was incorporated into a new (quantum) physics. Moreover, the domain 
of Newton’s model was extended to include information phenomena. On one 
hand, such a situation can be explained by the great positive experience of 
the use of real numbers in physics and technique. On the other hand, it 
is a consequence of scientific psychology. It seems that Newton’s model of 
material reality (and, in particular, real analysis) became a kind of scientific 
religion. Only phenomena described by this model are recognized as physical 
phenomena. 

The ability to form associations is one of main features of information 
systems. To describe dynamics in information spaces (and, in particular, 
spaces of ideas), it is natural to use number systems which describe the 
ability to form associations. One of such number systems, a system of m- 
adic numbers. From the information viewpoint the ring of m-adic integers 
Zm can be constructed in the following way. Let Am = {0,1,...,m—1}, where 
m €N,m > 1, be an alphabet. We consider infinite information strings with 
respect to this alphabet 


T= (Q1, @2, -5 QN, a)r € Am- 


We introduce the following metric on the space of information strings. Let 


1We think that the transition from the Euclidean real space to real Riemannian mani- 
folds does not change very much Newton’s model of material reality 
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x = (a;) and y = (G;). Then we set 


1., 
Pmlz, y) = Ear if a; = 6, j = 0, ...,k — 1, and ak Æ Br. 


This metric describes the nearness in the sense of associations of information 
strings x and y : two information strings (ideas) are close if their first (and 
the most important) information registers coincide. 

Each information string £ = (aj)%29 is identified with a number 


(e.s) 
= ms 
T= X ajm. 
j=0 


Such numbers form a system (ring) of m-adic integers Zm. Geometrically 
elements x € Zm can be represented as branches of a homogeneous m-adic 
tree. Thus we can define addition, subtraction and multiplication of branches 
(which represent infinite information strings, ideas). As it has been already 
mentioned in Chapter 4, if m = p > 1 is a prime number, then it is possible 
to define division. 

In fact, it is possible to use more general number systems which geomet- 
rically can be realized as trees with nonconstant number of branches leaving 
each vertex (see [65]). 


2 Dynamics on information spaces 


Everywhere below the abbreviation ‘J’ is used for the word information. 

We choose the space X = Z (or multidimensional spaces X = ZY) for 
the description of information. The X is said to be information a 

Objects which ‘live’ in J-spaces are said to be transformers of information 
(J-transformers). J-transformers are not characterized by localization in in- 
formation p-adic space (or real space). They are characterized by the ability 
to receive an external information and transform it in a new information. 

Each /-transformer 7 has internal clocks. A state of the clocks is described 
by an I-vector t € T = Zp which is called information time. The J-time can 


2We do not assume that X describes the whole information in the universe. By our 
philosophy there is no absolute physical space (which is typically identified with the real 
space). There is also no absolute information space. Different information phenomena can 
be described by different mathematical models for I-spaces. The p-adic model for [-spaces 
is the simplest from the mathematical point of view. 
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have different interpretations in different J-models. If 7 is a conscious system 
then t is (self-recognized) time of the evolution of this system. We can say 
about psychological time of an individual or about (collective) social time 
of a group of individuals. In fact, we have not to image t as an ordered 
sequence of time counts. This is only information with describes evolution of 
T. In principle, there is no direct relation between T-time and ‘physical’ time 
that is used in the model over the reals. 

At each instant t € T of I-time there is defined a total information state 
(I-state) q(t) € X of 7. It describes the position of 7 in the J-space X. The 
‘life’-trajectory of 7 can be identified with the trajectory q(t) in X. 

We use an analogue of the Hamiltonian dynamics on the J-spaces 3. As 
usual, we introduce the quantity p(t) = q(t) (= £q(t)) which is the in- 
formation analogue of the momentum. However, here we prefer to use a 
psychological terminology. The quantity p(t) is said to be a motivation (for 
changing of the J-state q(t)). 

The space Zp x Z, of points z = (q,p) where q is the J-state and p is the 
motivation is said to be a phase J-space. As in the ordinary Hamiltonian 
formalism, we assume that there exists a function H(q,p) ({-Hamiltonian) 
on the phase J-space which determines the motion of 7 in the phase J-space: 


q(t) = CORO) q(to) = qo, p(t) = -F (alt P() p(to) = po. 


The J-Hamiltonian H(p,q) has the meaning of an J-energy. In principle, 
I-energy is not related to the usual physical energy. 

The simplest J-Hamiltonian H;(p) = ap’, a € Zp describes the mo- 
tion of a free [-transformation 7, i.e., an J-transformer which uses only self- 
motivations for changing of its [-state q(t). Here by solving the system of 
the Hamiltonian equations we obtain: p(t) = po, q(t) = qo + 2apo(t — to) +. 
The motivation p is the constant of this motion. Thus the free /-transformer 
”does not like” to change its motivation pọ in the process of the motion in 
the I-space. If we change coordinates, ¢ = (q — qo)/k, k = 2apo, then we 


3In fact, this is an application to the I-theory of the Hamiltonian p-adic formalism 
developed in [107] (and generalized in [55]). 

4Tn fact, this simplest J-system is not trivial from the mathematical viewpoint. There 
exist other solutions which are nonanalytic (but smooth), see [97], [104], [107], [60]. These 
solutions may also have an interesting J-interpretation. We shall discuss this problem 
later. 
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see that the dynamics of the free ]-transformer coincides with the dynamics 
of its T-time. 

In the general case the J-energy is the sum of the J-energy of motivations 
H; = ap* (which is an analogue of the kinetic energy) and the potential 
I-energy V(q): 

H(q,p) = ap? + V(q). 


The potential V(q) is determined by fields of information. 

In the Hamiltonian framework we can consider interactions between T- 
transformers 7,,...,7v. These J-transformers have the J-times t,... ty 
and I-states q:(t),...,@nv(tw). By our model we can describe interactions 
between these /-transformers only in the case in that there is the possibility 
to choose the same J-time t for all of them. In this case we can consider the 
evolution of the system of the J-transformers 7,...,7y as a trajectory in 
the I-space ZY = Zp x +++ x Zp, g(t) = (ai(t),.-- an(t)). 

We think that the condition of consistency 


th=t=..=ty=t (2.1) 


plays the crucial role in many psychological experiments. We can not obtain 
sensible observations for interactions between arbitrary individuals. There 
must be a process of learning for the group 7,...7y which reduces /-times 
t1,...,ty to the unique T-time t. 

Thus, let us consider a group %,... , Ty of J-transformers with the inter- 
nal time t. The dynamics of J-states and motivations is determined by the 
I-energy; H(q,p), q€ ZN , pE zy . It is natural to assume that 


N 


H(q,p) = X aV, see ,9N); Qj E Zp. 
j=1 


Here H;(p) = DA ap is the total energy of motivations for the group 
T1,-.-,7n and Via) is the potential energy. As usual, to find a trajectory 
in the phase J-space Zn x zy , we need to solve the system of Hamiltonian 
equations: qj = = Pj = a qj(to) = qo, P;(to) = po. 

Consequences for cognitive and social sciences and psychology: 

1. Energy and information. In our model transmission of information is deter- 
mined by the -energy which is the sum of I-energy of motivations and potential 
I-energy. In principle, this process need no physical energy. 
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2. Distance and information. I-processes may evaluate in an J-space which 
differs from the real space (the absolute Newton space or spaces of general rela- 
tivity). Therefore the real (‘physical’) distance between I-transformers does not 
play the crucial role in processes of I-interactions. 

3. Time and information. I-dynamics is dynamics with respect to J-time t. 
There may be a correspondence tphys = g(t) between real time tphys E R and I-time 
t € Zp. This correspondence may not preserve distances. Thus some J-interactions 
may be interpreted as I-interactions with the past or future. 

4. Motivation. A motion in the J-space depends, not only on the initial 7-state 
go, but also on the initial motivation pp. Moreover, the Hamiltonian structure of 
the equations of motion implies that the motivation p(t) plays the important role 
in the process of the evolution. Thus J-dynamics is, in fact, dynamics in phase 
I-space. 

5. Social phenomena. By our model any social group G can be described 
by a system 7),...,7y of coupled J-transformers. There exists an J-potential 
V(q,---,@n) which determines an J-interaction between members of G. For ex- 
ample, democratic societies are characterized by uniform J-potentials V = $` ®(q— 
qj). Here a contribution into the potential J-energy does not depend on an indi- 
vidual. On the other hand, hierarchic societies are characterized by I-potentials 
of the form: 


V = Ao >) ®(a0, 4) + Ar D> Slang) + 


#0 j#0,1 
+Ar D> lmg) tB $` Slaa), 
j#0,... 4k 4,940,...,k 
where |Ao|p >> |Ai|p >> + >> |Aglp >> |Blp. These potentials describe the 
hierarchy 7 > 7 > ++- — Tk — (TK41,---,7N). The I-transformer 7 can be a 


political, national or state leader or a God. 

Remark 2.1. (Active information) Our ideas about information and in- 
formation field are similar to the ideas of D. Bohm and B. Hiley (BH) [13] (see 
especially p. 35-38). As BH, we do not follow “Shannon’s ideas that there is a 
quantative measure of information that represents the way in which the state of 
a system is uncertain for us”, [13]. We also consider information as an active in- 
formation. Such information interacts with I-transformers. As a consequence of 
such interactions J-transformers produce new information. The only distinguish- 
ing feature is that material objects are not involved in our formalism. According 
to BH active information interacts with material objects (for example, the ship 
guided by radio waves). BH assume that information fields have nonzero physical 
energy which directs other (probably very large) physical energy. However, physi- 
cal energies are not involved in our model. Thus we need not assume that J-fields 
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have some physical energy. In particular, we need not try to find (as BH, [13], 
p.38) an origin of such an energy. We also remark that BH discuss only quantum 
w-fields. We shall use both classical and quantum J-fields. 

In fact, BH consider objects which are similar to our J-transformers. However, 
they think that such objects must always have the material realization (i.e., they 
must be represented in real space). In our model we need not such a realization. 
We think that there might be J-transformers (such as the consciousness) which 
could not be in principle represented in the real space. 

BH discussed a difference between ‘active’ and ‘passive’ information. In fact, 
our model supports their conclusion that “all information is at least potentially 
active and that complete passivity is never more than an abstraction...” [13], p.37. 
If an J-transformer 7 moves in the field of forces w (classical or quantum), then 
the information x € supp V is active for 7 and the information x € Zp’ \supp V is 
passive for 7. Let v = V(t, x) be a time dependent potential. Then the set of active 
information X(t) = supp V(t) evolves in J-space. Thus some passive information 
becomes active and vice versa. 

Remark 2.2. (Mind-matter interaction) By our model there is no difference 
between ‘material’ and ‘nonmaterial’ J-transformers. Interactions between mind 
and matter are only particular interactions between [-transformers. On one hand, 
material objects can generate I-fields (fields of I-forces) which change I-states 
of cognitive objects. On the other hand, cognitive objects can also generate I- 
fields which change I-states of material objects. Later we shall see that cognitive 
objects are quantum J-objects. Thus mind-matter interaction is a particular case 
of interactions between quantum and classical objects. 


3 Information velocity, acceleration, mass and 
force, Newton’s laws 


We have considered dynamics of [-transformers of the unit mass. There the 
coefficient v of proportionality between the variation dq of the J-state and 
the variation ôt of J-time t: dq = vôt, was considered as a motivation. In 
the general case the motivation p may not coincide with v. Let us assume 
that the motivation p is proportional to v, p = mv,m € Zp. This coefficient 
m of proportionality is called an J-mass. We also call v an I-velocity. Thus 
ôq = £ ôt. 

Let 7, and mz be two J-transformers with the J-masses mı and mz and 
let |mi|p > |me|p. Let 7, and mz have the variations ôtı, ôtz of I-time of the 
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same p-adic magnitude, |dt;|, = |ótz|p, and let these variations generate the 
variations ôq, and dq of their J-states of the same p-adic magnitude, |dq|, = 
|5q2|p- To make such a change of the I-state, 7 need larger motivation: 
Ibilo = |%lplmilp > |pelp = |%4|plmalp. Thus the J-mass is a measure of 
an inertia of information. We define a kinetic I-energy by T = Ep. A 
variation ôt of T-time t implies also a variation ôp of the motivation p: dp = 
fot. The coefficient f of proportionality is called an J-force. Thus any change 
of the motivation is due to the action of an -force f. If f = 0 then dp = 0 
for any variation ôt of t. Thus an /-transformer cannot change its motivation 
in the absence of J-forces. By analogue with the usual physics we call the 
coefficient a of a proportion between the variation dv of the I-velocity v and 
the variation ôt of the J-time t, ôv = adt, an I-acceleration. Thus ôp = amét. 
This relation can be rewritten in the form of an information analogue of the 
second Newton law: 
ma=forp=f. (3.1) 
An I-force f is said to be a potential force if there exists a function V (q) such 
that f = =F where V is called the potential, or potential energy. The total 
I-energy H is defined as the sum of the kinetic and the potential J-energies, 
H(q,p) = =p? + V(q). The Hamiltonian equation p = oe coincides with 
the Newton equation p = f. 
Example 3.1. (Hooke’s J-system). Let the J-force f be proportional to 
the I-state q, f = mq, where m is the I-mass and 8 € Z, is a coefficient 
of the interaction. Here (3.1) gives the equation g = @q. As f = =a 


V(q) = ne and H(q,p) = £ — meg’, the Hamiltonian equations are 
ġ = p/m and p = mG’q. Their solutions have the form g(t) = ae* + be~™. 
The I-state g(t) and motivation p(t) are defined only for instants of J-time 
which satisfy the inequality 

[Btlp < Tp. (3.2) 
This condition can be considered as a restriction for the magnitude of the 
I-force. If the coefficient of the interaction |G|, < rp, then dynamics q(t) 
of the I-state is well defined for all t € Zp. Larger forces imply the re- 
striction condition for J-time. Let |G|, = 1. If p # 2 then (3.2) has the 
form t € U,/,(0), ie., t = ap + agp? +--+. Thus the I-state q(t) of 
the J-transformer 7 can be defined (observed) only for the instants of time 
to = 0,t) = p, ...,tp-1 = (p—1)p,... . If p = 2 then (3.2) has the form 
t € U,/4(0), i.e., and t = a22? +0327 +---. Thus the I-state q(t) of T can be 
defined (observed) only for the instants of time tọ = 0, tı = 4, t2 = 8,.... 
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Let f = —mß?4q, i.e., V (q) = mig and ğ = —(?q. Here q(t) and p(t) have 
the form g(t) = acos Gt + bsin Gt. Here we also have the restriction relation 
(3.2). In the contrary to the real case the p-adic trigonometric functions are 
not periodical. There is no analogue of oscillations for the J-process described 
by an analogue of Hooke’s law. 

Let us consider the solution of the Hamiltonian equations with the initial 
conditions q(0) = 0 and p(0) = m@: q(t) = sin Bt, p(t) = m8 cos 8t. We 
have gp = (mG/2) sin 2Gt. By using the p-adic equality | sin al, = |a|, we get 
laplp = |Im6|,|Gt|p. Relation (3.2) implies 


lalplPlp < |mA|prp. (3.3) 


This is a restriction relation for the trajectory (q(t), p(t)) in the phase T- 
space. Let 8 = 1/m. Then (3.3) gives |q|,|plp < rp. If the motivation p is 
strong |p|» = 1, then q can be only of the form q = a@p+agp*+---,p #2 
and q = a22? + a323 +--+, p = 2. If the motivation p is rather weak then 
the I-state q of an J-transformer can be arbitrary. 

We discuss the role of the J-mass in the restriction relation (3.3). There 
the decrease of the J-mass implies more rigid restrictions for possible J-states 
(for the fixed magnitude of the motivation). 5. 

The restriction relation (3.3) is an analogue of the Heisenberg uncertainty 
relations in the ordinary quantum mechanics. However, we consider a clas- 
sical (i.e., not quantized) J-system. Therefore a classical J-system can have 
behaviour that is similar to quantum behaviour. 


4 Mathematical ‘pathologies’ 


In p-adic analysis the condition f = 0 does not imply that a differentiable 
function f is a constant, see [97]. There exist complicated continuous motions 
(q(t), p(t)) in the J-phase space for I-transformers with zero J-energy (ġ = 0 
or p= 0). 

In psychological models these motions can be interpreted as motions with- 
out any motivation. Such motions need no information force. On the other 
hand, we can consider an J-potential V (q) such that X = 0. Here the poten- 
tial [-energy V (q) can have complicated behaviour on the J-space X = Z. 


5In psychological (social) applications we get that the individual (or a group of in- 
dividuals) with a small magnitude of J-mass and the strong motivations will have quite 
restricted set of I-states. 
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At the same time the J-force f = 0. Thus there may exist J-fields which do 
not induce any J-force. 

All mathematical pathologies can be eliminated by the consideration of 
analytical functions. If f = 0 and f is analytic then f = constant®. 


5 Quantum mechanics on information spaces 


It is quite natural to quantize classical mechanics on information spaces over 
Zp. We give the following reasons for such quantization. Observations over I- 
quantities are statistical observations. We have to study statistical ensembles 
of J-transformers (instead studying of an individual J-transformer). Such 
statistical ensembles are described by quantum states ¢. As usual in quantum 
formalism, we can assume that a value \ of an -quantity A can be measured 
in the state ¢ with some probability Pg(A = A). This ideology is nothing 
than the application of the statistical (ensemble) interpretation of quantum 
mechanics to the information theory. By this interpretation any measurement 
process has two steps: (1) a preparation procedure €; (2) a measurement of 
a quantity B in the states ¢ which were prepared with the aid of €. 

Let us consider these steps in the information framework. By E we have 
to select a statistical ensemble ¢@ of J-transformers on the basis of some T- 
characteristics. Typically in quantum physics a preparation procedure € is 
realized as a filter based on some physical quantity A, i.e., we select elements 
which satisfy the condition A = p where p is one of the values of A. We can 
do the same in quantum J-theory. An /-quantity A is chosen as a filter, i.e., 
I-transformers for the statistical ensemble ọ are selected by the condition 
A = p where u E€ Zp is some information. For example, we can choose 
A = p, the motivation, and select the statistical ensemble ¢ = ¢(p = p) 
of J-transformers which have the same motivation  € Zp. Then we realize 


§In psychological models we can interpret analytical trajectories in the phase I-space 
as normal behaviour, i.e., an individual need a motivation for the change of a psycho- 
logical state. Here we can observe some psychological (information) force which induces 
this change. There is a psychological (information) field which generates this force. Tra- 
jectories (non-analytical) with zero motivation are interpreted as abnormal psychological 
behaviour (probably such trajectories correspond to mental diseases; on the other hand, 
they may explain anomalous phenomena). Here an individual changes his psychological 
state without any motivation in the absence of any psychological (information) force. Here, 
in fact, the p-adic generalization of the Hamiltonian formalism does not work. We need 
to propose a new physical formalism to describe such phenomena. 
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the second step of a measurement process and measure some information 
quantity B in the state ¢p—,). For example, we can measure the I-state q of 
I-transformers belonging to the statistical ensemble described by ¢(p—,,). We 
obtain the probability distribution P(g = A|p = p), à, p € Z, (a probability 
that J-transformer has the J-state q = à under the condition that it has 
the motivation p = p). It is also possible to measure the J-energy E of I- 
transformers. We obtain the probability distribution P(E = A|p = u), A, 4 € 
Zp. On the other hand, we can prepare the statistical ensemble ¢g=,) by 
fixing some information u € Zp and selecting all J-transformers which have 
the J-state q = u. Then we can measure motivations of these /-transformers 
and we obtain a probability distribution P(p = Alq = p). 

Other possibility is to use a generalization of the individual interpretation 
of quantum mechanics. By this interpretation a wave function y(x), x € R”, 
describes the state of an individual quantum particle. In the same way we 
may assume that a wave function y(x), £ € Z, on the I-space describes the 


state of an individual [-transformer 7 ”. 


In fact, a mathematical model for quantum J-formalism has been already 
constructed. This is quantum mechanics with p-adic valued functions, see 
[57], [60], [4]. We present briefly this model. The space of quantum states 
is represented as a p-adic Hilbert space K (see [55], [60], [65] for the theory 
of such spaces). This is a Qp-linear space which is a Banach space (with 
the norm ||- ||) and on which is defined a symmetric bilinear form (-,-) : 
K x K — Qp. This form is called an inner product on K. It is assumed that 
the norm and the inner product are connected by the Cauchy-Bunaykovski- 
Schwarz inequality: |(z,y)|p < ||z\||lyll, z,y € K. By definition quantum 
I-state ¢ is an element of K such that (¢,¢) = 1; quantum J-quantity A is a 
symmetric bounded operator A: K — K, i.e., (Az,y) = (x, Ay), x,y € Œ. 
We discuss a statistical interpretation of quantum states in the case of a 
discrete spectrum of A. Let {A1,...,An;---}, Aj E Zp be eigenvalues of A, 
Apn = Anbn, Pn E K, (bn; On) = 1. The eigenstates ¢, of A are considered as 
pure quantum /-states for A, i.e., if the system of J-transformers is described 


TAs we have seen, the problem of interpretations is the important problem of ordi- 
nary quantum mechanics on real space. The same problem arises immediately in our 
quantum I-theory. We do not like to start our investigation with a hard discussion on 
the right interpretation. We can be quite pragmatic and use both interpretations by our 
convenience. 

‘SIn p-adic models we do not need to consider unbounded operators, because all quantum 
quantities can be represented by bounded operators, see [4], [65]. 
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by the state n then the -quantity A has the value A, € Zp with probability 
1. Let us consider a mixed state 


o= > andns Qn E Qp, (5.1) 


n=1 


where (¢,¢) = 3°, ¢2 = 1°. By the statistical interpretation of ¢ if we 
perform a measurement of the -quantity A for J-transformers belonging to 
the statistical ensemble described by ¢ then we obtain the value A with 
probability P(A = A,|¢) = g2. 

The main problem (or the advantage?) of this quantum model is that 
these probabilities belong to the field of p-adic numbers Q,. The simplest 
way is to eliminate this problem by considering only finite mixtures (5.1) for 
which qn € Q (the field of rational numbers Q is a subfield of Qp). In this case 
the quantities P(A = A,|¢) = q2 can be interpreted as usual probabilities 
(for example, in the framework of Kolmogorov’s theory). Therefore we may 
assume that there exist (can be prepared) quantum J-states ¢ which have the 
standard statistical interpretation: when the number N of experiments tends 
to infinity, the frequency vy(A = A,|¢) of an observation of the information 
An € Z, tends to the probability q2. 

However, we can also use p-adic probabilistic models of Chapter 4. By 
using the p-adic frequency probability model for the statistical interpretation 
of quantum J-states we may assume that there exists [-states ¢ (ensembles of 
I-transformers) such that the relative frequencies vy (A = A,|¢) have no limit 
in R, i.e., we cannot apply the standard law of large numbers in this situation. 
Hence if we perform measurements of an J-quantity A for such a quantum T- 
state and study the observed data by using the standard statistical methods 
(based on real analysis), then we shall not obtain the definite result. There 
will be only random fluctuations of relative frequencies, see [60] 1°. 

The evolution of a p-adic wave function is described by an J-analogue of 


9 As in the usual theory of Hilbert spaces, eigenvectors corresponding to different eigen- 
values of a symmetric operator are orthogonal. 

10Such a behaviour can be related to psychological experiments. Here the possibility of 
the use of p-adic probability models gives the important consequence for scientists doing 
experiments with a statistical I-data: the absence of the statistical stabilization (random 
fluctuations) does not imply the absence of I-phenomenon. This statistical behaviour 
may have the meaning that this I-phenomenon cannot be described by the standard 
Kolmogorov probability model. 
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the Schrodinger equation: 
h? 2 

Pe OW et m) = 52 FE h 2) — Vl aeta), (5.2) 

i 

where m is the J-mass of a quantum J-transformer. Here a constant h, plays 

the role of the Planck constant. By pure mathematical reasons (related to 

convergence of p-adic exponential and trigonometric series) it is convenient 

to choose hy = z 

We may also present some physical arguments for such a choice. In or- 
dinary quantum mechanics the Planek constant is related to the measure of 
discretization. The constant hp = = z Ís related to the level of discretization of 
information. 

We use the factor i = y—1 in (5.2), because we like to have the total 
coincidence with formulas of the ordinary quantum mechanics. As we have 
already noted, in the p-adic case the functions ¢®7 and e°” have the same 
(non-oscillating) behaviour. Therefore, in principle, we can use the analogue 
of (5.2) in that the factor i is omitted. 

The use of i implies the consideration of the extension Q,(i) = Qp x 
iQp of Qp. Elements of this extension have the form z = a + ib,a,b € Qp. 
This extension is well defined for p = 3,mod 4. As usual, we introduce a 
convolution Z = a — ib; here we have zz = a’ +b?. In what follows we assume 
that wave functions take values in Z,(i) = Zp X iZp. 

Example 5.1. ( A free J-transformer). Let the potential V = 0. Then Fa 
solution of the Schrödinger equation corresponding to the J-energy E = 
has the form!t 


Fa 
p(t, 1) = elle FO/ho, (5.3) 


By the choice hp = 1/p this function is well defined for all x € Zp and t € Zp. 
As ~w = 1, this wave function describes the uniform (p-adic probability) 
distribution, see Chapter 6, on the ring of p-adic integers Zp. Thus an T- 
transformer 7 in the state ~ can be observed with equal probability in any 
state x € Zp. In this sense behaviour of the free /-transformer is similar to 
behaviour of the ordinary free quantum particle. On the other hand, there is 


11 We note that formal expressions for analytical solutions of p-adic differential equations 
coincide with the corresponding expressions in the real case (in fact, we can consider 
these equations over arbitrary number field, see [60], [55]). However, behaviours of these 
solutions are different. 


184 Chapter 5 


no analogue of oscillations °: y(t, £) = cos(px— Et) /hp +isin(pz — Et) /hp, 
and |cos(pa — Et) /hplp = 1, |sin(pa — Et)/hplp = |(px — E8)/Rplp- 

We consider a psychological (and social) consequence of Example 5.1: in 
the absence of the external potential the same motivation p may imply any 
I-state x € Zp. 

Let us consider mixtures of states of the form (5.3). We set t = 0. Let 
W(x) = Yp, + a2Ypz; 41,02 € Zp. 

If we compute < Y, ¢% >= Jz w(x)q)(x)dx (where dz is a uniform p-adic 
valued distribution on Z,) we see a large difference with ordinary quantum 
mechanics: < Y, Y >+ a1āı + azā2. There is nonzero correlation term. For 
a = (pi — P2)/hp, we have [69]: 


asin a 


T(a) =< dpi Yp: > +< Ypa Yer >= Toon" 


Thus there are correlations between the motivations pı and pz in the state 
p. 
Example 5.2. (Quantum Hooke’s system) To give an example of a 
Hamiltonian with discrete spectrum, we consider the formal p-adic general- 
ization of the Hamiltonian of a harmonic oscillator: 
‘ kR e ı 
H = NES os Se 2,2 
mdr? 2) 2? 
where m is the J-mass. We consider w simply as the coefficient of interaction 
(there is no analogue of harmonic oscillations). The operator H has eigen- 
values Ep = hpwn,n = 0,1,... (see [65]). However, in the p-adic case the 
difference between continuous and discrete spectra is not so strong (for each 
En, we have En = limg_.o. Eip» lk # n). On the other hand, discreteness of a 
spectrum, of course, induces some restrictions on values (information) which 
can be observed. 


12Ts it possible to reproduce oscillations with respect to ordinary real time on the basis 
of the information model? It could be done by the time scaling. Let f : Zp —> Zp be an 
arbitrary continuous function. Then f(t + kp") ~ f(t) for all k € Z for sufficiently large 
n (uniformly for t € Zp). Let tphys = g(t) be a law of the correspondence between T-time 
t E€ Zp and real time tphys € R. If 27 = g(p") then the p-adic continuity will imply the 
periodicity in real time. Therefore, the ordinary wave behaviour is nothing other than 
a consequence of continuity of information flows and the appropriative choice of a time 
scale. Depending on the time scale an J-process may or may not exhibit wave behaviour 
in the real picture of reality. 
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6 The pilot wave theory for cognitive systems 


Let us consider a system of N J-transformers, ti, ..., Ty, with Hamiltonian 
2 

H=>; cam oe Vpi(£k—z;). The wave function w(t, £), £ = (24, ..., EN), 

Tk © ZY (where m is the dimension of J-space which is used for the descrip- 

tion of the J-state of 7;) evolves according to the Schrödinger equation (5.2). 

A purely mathematical consequence of this is that 


? alt + ae Fol, t) =0, 


where p(t,z) = v(t, x)¥(t,z) is a probability density on the configuration 
I-space ZON an and 


je (x,t) = mp Imole x) g—W(t,x)) 


As in the ordinary Bohm’s formalism, we assume that a quantum /-transformer 
Tę has at any T-time! well defined I-state 1, and motivation pg. I-states rz 
evolve according to 
Ik (t, x) 
p(t, £) ` 
It is assumed that the wave function w(t, x) drives J-transformers 7, ..., TN- 
If we generalize ideas of J. Bell [12], then w(t, x) can be considered as a new 
information field which is generated by an ‘information life’ of the system 
of J-transformers. At the moment we could not provide the clear explana- 
tion of the origin of this information field. It is only possible to observe its 
influence to trajectories of /-transformers in the I-space Zp. By Bohm’s ide- 
ology we consider a new J-potential (a quantum J-potential) Q generated by 
w(t,x). Then the quantum -motion can be considered as a perturbation of 
the classical J-motion (based on the classical potential V). 

This model gives the natural description of an evolution of the J-states 
of a system of cognitive systems 7, ..., Ty. 

We start with an attempt to describe the work of a brain 7 in the frame- 
work of the pilot-wave J-theory. The 7 has an incredibly complex inter- 
nal structure which generates a new information field given by brain’s wave 


y(t) = 


(6.1) 


130f course, we assume that I-times t4, ..., ty of the J-transformers 7, ..., Ty satisfy the 
condition of consistency (2.1). 
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function ~(t,z) 14. We claim that the field y(t, £) induced by the brain 7 
is nothing other than a conscious field. Thus conscious processes are quan- 
tum J-processes. A conscious (quantum) motion in phase -space differs 
from an unconscious (classical) motion. This difference is due to a quantum 
I-potential Q. 

We consider now a system S$ of brains 7, ..., Ty. The wave function w(t, x) 
of the S depends on J-states of all brains in the S. Thus motions of these 
brains in phase [-space zn are not independent. At the same time there 
might be no classical potential V which induces such a dependence. Of 
course, if (as in ordinary real formalism) y(t, x) = ppe p(t, x), then the 
I-motions of brains 7; are independent. There are no correlations between 
consciousness of different brains. 


Remark 6.1. (Non-locality) This is the good place to discuss the problem 
of non-locality of the pilot wave formalism. Some authors (see, for example, [30]) 
consider non-locality as one of the main difficulties of the pilot wave formalism. 
However, non-locality is not a difficulty in our pilot J-wave formalism. This is 
non-locality in the I-space. Such non-locality can be natural for some J-systems. 
For cognitive systems, J-non-locality means that ideas which are separated in a 
p-adic space can be correlated. However, p-adic separation means only that there 
are no strong associations between ideas or groups of ideas. But this absence of 
associations does not imply that these ideas could not interact. 


By our model each human society S has a wave function w(t, x). This 
function gives the description of a quantum potential Q. The quantum po- 
tential can essentially change J-motions (i.e., evolutions of ideas) of individ- 
uals. Different societies are characterized by quantum potentials of different 
forms. This model provides an explanation of such collective phenomena as 
religion or political (or national) ideology. 

The same considerations can be applied to animals and plants. The only 
difference is probably that here quantum J-potentials are not so strong. Thus 
we get the conclusion that there may exist the wave function i(t, £) of all 
living organisms. The wave function y(t, z)iv(t, £) can be represented in the 


14We do not assume that (t,x) gives the complete description of 7. This description 
might be provided on the basis of hidden variables models which describe internal T- 
processes in 7 (see, for example, [65], [66]). Of course, the y(t, x) induces the definite 
trajectory in phase J-space Zo”. However, on the basis of the w(t, x) we cannot describe 
internal -processes in 7. In particular, an answer to the question Why does 7 induce an 
I-field of the form Y(t, x)? could not be given by the quantum formalism. 
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form: 


p(t, x) tiv( (t, x)= dul (t, £), 


where p(t, x) is a wave function of the living form f. An observable F (a 
living form) can be represented by a symmetric operator in a p-adic Hilbert 
space, Fy, = fyf, where f € Z, is the cod of the living form f in the 
alphabet {0, 1,...,p — 1}. By equation (6.1) the evolution of the fixed form 
fo depends on evolutions of all living forms f. 

The process of the evolution of living forms is not just a process based on 
Darwin’s natural selection. This is a process of a quantum J-evolution in that 
the conscious field of all living forms plays the important role. This model 
might be used to explain some phenomena which could not be explained by 
Darwin’s theory. For example, the beauty of colours of animals, insects and 
fishes could not be a consequence of the only process of the natural evolution. 
This is a consequence of the structure of the conscious field Y(t, z)iy(t, £) }°. 
By the same reasons we can explain some aspects of relations between robbers 
and victims. It seems that in nature there is a well organized system which 
gives to robers the right to eat victims. This system is nothing other than a 
result of the evolution due to (6.1). 

Our formalism improves the ordinary pilot wave theory. One of delicate 
problems of this theory is the difference between ordinary fields and -fields. 
There is no such a problem in the J-theory. All J-fields have no physical 
energy. Thus we need not discuss the energy balance for the field w(t, x) 
(see [13], p.38). Of course, there is still the difference between classical 
and quantum rules for computing J-forces. As in the standard pilot wave 
theory, the increase of the (p-adic) amplitude of y(t, x) does not imply the 
increase of the (p-adic) amplitude of the corresponding J-force. However, 
even in classical mechanics over p-adic numbers the increase of the amplitude 
of an -force does not imply automatically an essential perturbation of a 
trajectory in phase J-space. For example, let f = c € Z. Then p(t) = 
po + c(t — to) and g(t) = go + polt — to) + c/2(t — to)?. Thus A,(t) = 
PO- pO Elp = lelli -tolp and Aq(t) = |g E) -4 Olp = |e/2lplé ~ tol 


15Of course, at the moment we cannot find such a function Y(t, x)iv(t, £) which induces 
the real distribution of colours. In any case our model implies the existence of such a field. 
Therefore, the process of colours’ evolution is a process of the simultaneous evolution of 
colours of numerous living forms. These colours do not serve only to the convenience of 
concrete forms (as it should be due to Darwin’s theory), but they were produced by the 
correlated evolution of all living forms. 
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The quantities A,(t) and A,(t) can be very small for some instances of J-time. 
For example, if we choose a new time-scale % = to + kp”, where k = 0,1,... 
and n is sufficiently large, then A,(7) and A,(7) will be practically zero. 
For example, if the J-state and motivation are measured at instances Tk 
then we would not find a difference between the c-motion and free motion. 
Complex p-adic Hamiltonian systems have similar property. For example, let 
f = cq,c € Zp, and let po = qo = 1. Here p(t) = e-t) and A, (t) = |elpltlp- 
Moreover, if we use the general formalism based on J-spaces over Zm, where 
m is not prime, then it may occur that |ct|, = 0 for some instants of J-time. 

Cognitive considerations imply a new viewpoint on the origin of the quan- 
tum I-field Y(t, xz). According to BH (see especially [13], p. 38), the field 
w(t,x) is an external field which guides the quantum particle. However, for 
a cognitive system 7 (or a group of cognitive systems 7,...,7y), it seems that 
this field is generated by 7 (or by 7...,7;). Thus the interaction between an 
I-transformer and the field w(t, x) is an interaction with self-induced J-field. 

Moreover, the w(t, x) can be considered as a new J-transformer, C. The C 
gets information from J-transformers 74, ..., Ty (via the classical field V) and 
C changes I-states of 7, ...,7 (via equation (6.1)). The only distinguishing 
feature is that the J-state of C cannot be identified with a point in the T- 
space Z for a finite m. However, if we extend the J-formalism by using 
infinitely dimensional J-spaces over Zp, then C can be considered as an Z- 
transformer. Thus each cognitive system (or a group of cognitive systems) 
induces a new J-transformer C (the consciousness) which evolves in infinite 
dimensional J-space. 

In principle, C may induce a new field ®(t, ~(-)). This field determine 
the quantum potential for C. The field ®(t,~(-)) can be again considered 
as an I-transformer C’(which evolves in infinite dimensional I-space). It 
is the consciousness of (the consciousness C). If such a construction can 
be repeated many (or infinitely many?) times, then there arises a ‘con- 
scious tower’, C,C',...,C",... . We might speculate that the motion of a 
cognitive system in J-space is determined by the hierarchic conscious system 
C,C?,...,C",... . Of course, the effect of C! does not present in the linear 
Schrodinger equation. If we assume the hypothesis on the hierarchic con- 
scious structure, then the linear Schrodinger equation has to be changed to 
nonlinear equation. Hence the cognitive considerations support de Broglie’s 
ideas about nonlinear perturbations of Schrodinger’s equation. 

' Remark 6.2. (Complex inner structure of quantum systems) One of 
the main consequences of BH considerations is that a quantum system (for 
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example, an electron) has a complex inner structure. Our consideration of 
quantum cognitive systems supports this idea. At the moment the existence 
of a complex inner structure for ordinary quantum systems is not strongly 
motivated. On the other hand, there is no doubt that such a structure exists 
in cognitive systems. 

In our quantum formalism on J-spaces there is no difference between 
cognitive and non-cognitive quantum systems (in particular, each such a 
system has the I-field y(t, xz)). Thus ordinary quantum particles might be 
considered as cognitive systems. From this point of view, for example, the 
two slit experiment demonstrates nothing other than cognitive behaviour of 
quantum particles. 


7 Subjective probability as probability with 
respect to the statistical ensemble of hu- 
man ideas 


Let 7 be an J-transformer representing the human brain. Here information 
space X, is space of all ideas of r. We propose the following ensemble in- 
terpretation of the subjective probability theory. The space of ideas Xy is 
the statistical ensemble which is used by 7 for finding subjective probabili- 
ties. Let A be some event. Then A can be represented by 7 as a subset of 
the statistical ensemble X,;. To find subjective probability, 7 calculates the 
proportion: 

= Ay 

Xl 
Let 7 use the coding system with m letters, Am = {0,1,...,m — 1}, and 
sequences q = (a, .-., @N-1),@j E Am, of the length N for representing of 
ideas. If space of ideas X; of T contains all possible information strings of 
the length N, then P™>({a}) = 1/m% for each single idea a € X;. 

On the other hand, in a mathematical model we can assume that ideas 
of T are represented by information strings of the infinite length. If we 
suppose that space of ideas X; of 7 contains all infinite strings, then Xr 
can be identified with the set of m-adic integers Zm. To define conventional 
subjective probabilities, we use the uniform distribution u on X; = Zm. The 
p is uniquely determined by its values on balls of Zm : “(U1/ms(a)) = 1/m* 
for any a E€ Zm and k > 1. In fact, this is the translation invariant measure 


p%>( A) = Px, (A) 
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(the Haar measure) on the additive locally compact group Zm. Denote by 
B(Zm) the o-algebra of Borel subsets of Zm. Thus (in the framework of real 
analysis) subjective probability theory can be described by the Kolmogorov 
probability space 


P _ (Zm, B(Zm), P), where psub = ph. 


To find subjective probability of an event A € B(Z,,) (a subset of space of 
ideas X; of r), 7 performs the integration on A: P%™>(A) = f, du(z). 

Some 7 can have spaces of ideas X; which are proper subsets of Zm. In 
such a case T performs the integration P(A) = f anx, AUCT). 

Remark 7.1. Of course, in some situations the brain can use nonuni- 
form probability distributions on the statistical ensemble of ideas. Let p : 
Zm — R+ be some probability density (namely, f} (x)du(x) = 1). Then 
P(A) = fan, olardu(s). 

The great role of Bayes theorem (see Chapter 1) in subjective probability 
theory has a cognitive explanation. Consider a set of hypotheses H;,i = 
1,...,.N, (subsets of space of ideas Xr). Typically these sets of ideas have 
rather simple structure. As usual, we suppose that H; N H; = 0,i Æ j, and 


UL, Hy = Xr. 


Let E be some event (a subset of space of ideas Xz) with rather complicated 
structure. But we suppose that implications z € H; > x € E are easily veri- 
fied for alli = 1,..., N (if an idea x € Hj, then it easy to check the condition 
x € E). Therefore conditional (subjective) probabilities P(L/H;) = Pp, (E) 
can be easily found by 7 as probabilities with respect to the statistical en- 
sembles H;. On the other hand, it is not so easy to find conditional probabil- 
ities P(H;/E) = Pg(H;), because it is not so easy to verify the implication 
x E€ E — z € H; (the statistical ensemble E has quite complicated structure 
and it is not easy to check even the condition z € E). To find probabilities 
P(H;/E), T uses Bayes’ theorem: 
P(E/H;)P(H;) 


PPeves( H,/ E) = £, P(E/H,)P(E) ; (7.1) 


8 Bayes’ theorem and Freud’s psychoanalysis 


The idea that Bayes’ theorem (7.1) is used by a brain 7 to calculate rather 
complicated conditional probabilities may be used as a probability explana- 
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tion of Freud’s psychoanalysis. As in the previous section, we suppose that 
7 has a processor mg which calculates probabilities P®*¥*5(H;/E) via (7.1). 
It may be that in some situations (by some psychological reasons) 7 may 
check the results of functioning of zg by calculating probabilities P(H;/F) 
directly as probabilities with respect to the ensemble of ideas and by compar- 
ing two kinds of probabilities, P®*%*(H,/F) and P(H;/F). Suppose that 7 
observes that P®*¥**(H,/E) essentially differs from P(H;/ E). Such a situation 
is quite surprising for 7. The 7 may start to calculate directly probabilities 
P(H;/E) again and again for many different events E. It may be that the 
T observes that for some class € of events E : P8#(H;/E) #4 P(H;/E). 
Such a situation may imply a psychological crisis of r and even a mental 
decease. Moreover, it may even imply a physical decease. This has a simple 
probabilistic explanation. As a consequence of the negative experience with 
Bayes’ probabilities P®*”°°(H,/F) for E € €E, the r becomes to be afraid to 
use the processor 7g to calculate conditional probabilities with respect to 
events E € £. Instead of simple calculations (7.1), the 7 must analyse the 
relation 

reEkouxe Hi; (8.1) 


to find directly conditional probabilities P(H;/E) as the ensemble probabil- 
ities 
_|EO H]| 
JE] 

If relation (8.1) is very complicated, then the calculation of Pe(H;) takes a 
lot of time, energy and brain’s memory. The 7 could not obtain the decision 
for a reasonable time. The repeating of such a situation may imply that the 
T becomes to be afraid to obtain any definite decision. This behaviour is 
nothing than depression. On the other hand, the 7 may be involved in the 
practically infinite chain of considerations to analyse the structure of some 
concrete event Fax E€ E. Such a behaviour is nothing than manic behaviour. 

What is the origin of the difference between probabilities P?*¥°5(H;/E) 
(calculated via (7.1)) and P(H;/E) = Pg( Hi)? 

We think that Freud’s theory on conscious and unconscious ideas gives 
the answer to this question. By Freud space of ideas Xy can be represented 
as 


Pg(H;) 


Xr = Xic U Xīu, 


where X;, and X7,, are spaces of conscious and unconscious ideas respec- 
tively. It is natural to suppose that 7 uses hypotheses H, ..., Hy consciously. 
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Thus H; C Xr, j =1,..., N. On the other hand, it may be that EN X71, 4 0 
for many E C Xr and, moreover, it may be that the ‘volume’ |E N X7,z| is 
rather large. The conditional probabilities P(E /H;) are based on the im- 
plication z € H; — x € E. Here 7 uses only conscious ideas. Thus the 
calculation of P(E) via 


P(E) =) P(E/H;)P(H;) (8.2) 


j=l 


is also based only on conscious ideas. In fact, (8.2) gives the ensemble prob- 
ability Px, ,(E) = P(E N Xr) (in fact, UIL, Hj = Xr). Hence 


as P(EN A; 


Thus P®*ve(H;/E) = P enx, (Hi). On the other hand, we have 


P(EN H;) 
PET en Re aD) 
If P(EN X7,,) is rather large, then P(H;/E) << P®¥(H;/E). 

The aim of Freud’s psychoanalysis is to find the unconscious component 
of the event E, Ek, = EM Xr, and to move this component into space 
of conscious ideas Xz of r. The unconscious component HE, can contain a 
large amount of meaningless (from the conscious viewpoint) ideas and asso- 
ciations. By reading Freud’s books it is impossible to understand why the 
conscious realization of such ‘meaningless’ information may really help to 
solve psychological problems of 7 (and, in particular, to cure some mental 
deceases; for example, depression). Our probability model (based on proba- 
bilities with respect to ensembles of conscious and unconscious ideas) gives 
the explanation of Freud’s method for the treatment of mental deceases: the 
conscious meaning of ideas x € Fy does not play any role, only the ’volume’ 
of the set E, in the space of ideas is important. After the mental treatment 
(via Freud’s psychoanalysis) the space of conscious ideas Xz, is extended, 
Xi, — Xie in such a way that E C X7, (or at least |E N X7,,| becomes 
small; here X;},, = Xr \ X;,). Thus conditional probability PBayes (77; / E) 
(calculated via (7.1)) becomes practically equal to conditional probability 
P(H;/E) = Pg(H;). After some attempts of the comparative analysis of 
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these probabilities the 7 understand that deviations of probabilities have 
been eliminated by Freud’s psychoanalysis. The 7 starts again to use Bayes 
formula (7.1) to calculate probabilities P (H;/ E). This implies the great econ- 
omy of time, energy and memory resources. 

Remark 8.1. Of course, our probabilistic model could not explain the whole 
mechanism of mental deviations induced by splitting of space of ideas Xy in a 
conscious and unconscious components. From the probabilistic viewpoint the ‘vol- 
ume’ |E£,,| of the unconscious component F, plays the crucial role. However, it 
may be that 7 does not try whenever to check the result P87°(H;/E) (obtained 
via (7.1)) by the direct calculation of the ensemble probability P(H;/ E). Such a 
T lives in the psychological world determined by Bayes’ probabilities (7.1) and it 
would never have psychological problems. Our model could not explain why some 
individuals 7 start to check (7.1) and analyse (8.1) and other individuals are not 
interested in such considerations. Our model cannot explain a mechanism of the 
exchange of ideas between conscious and unconscious components (see [68], [69] 
for some other models). 
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Chapter 6 


Tests for randomness for p-adic 
probability theory 


As it has been mentioned in Chapter 3, the first p-adic probability models 
[56], [60] were attempts to extend R. von Mises frequency probability the- 
ory to the p-adic case (see Chapter 4, section 2). As relative frequencies 
vy = 7 E Q, we can study their behaviour not only in R, but also in Q. It 
is well know that von Mises’ theory is based on two principles: 

(1) the principle of the statistical stabilization of relative frequencies and (2) 
the principle of randomness. As we have seen, the first principle can be nat- 
urally generalized to the p-adic case and p-adic probabilities are defined as 
limits of relative frequencies with respect to p-adic topology. However, as in 
the ordinary real probability theory, there is the large problem with the prin- 
ciple of randomness. In the p-adic case the situation with stability of limits 
of relative frequencies with respect to place selections is even worse than in 
the real case, because the p-adic metric is very unstable: if |n|, =€ < 1, 
then |n + 1|, = 1. In the p-adic case we have not even the possibility to 
restrict our considerations to a countable number of place selections (as we 
can do in the real case by Tornier theorem). To obtain the reasonable defi- 
nition of p-adic randomness, we tried also to apply the theory of algorithmic 
complexity (see, for example, [18], [76], [77], [100], [79]). However, there was 
no large progress, see [60], [53], [54]. We present now a p-adic generalization 
of Martin-Léf’s theory [83]-[85] based on tests for randomness'. Such a gen- 
eralization looks as the most natural approach to p-adic randomness. Here 


1This theory was developed by A. Khrennikov and S. Yamada [72]. 
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we find natural tests for randomness for p-adic valued uniform probability 
distribution. Each test for randomness induces a series of limit theorems. On 
the other hand, individual limit theorems are not good candidates for test 
for randomness, because each theorem describes behavior of a subsequence 
Sn, (w) of the sequence S,(w) = &(w) +---+ &(w) of independent equally 
distributed random variables. 

We proved that it is possible to enumerate effectively all p-adic test for 
randomness. However, in the opposite to Martin-Lof’s theorem for real prob- 
abilities we proved that a universal p-adic test for randomness does not exist. 

We shall use the standard terminology of the book of M. Li and P. Vitanyi 
[79]. The abbreviation r.e. is used for “recursive enumeration.” 


1 p-adic probability measures on the space of 
binary sequences 


We set X = {0,1}, X° = {x = (@1,...,2n) : a E X}, X* = UX", 
X” = {w = (W,... Wry...) w E€ X}. For z € X”, we set I(x) = n. For 
xz E€ X*, I(x) = n, we define a cylinder U, with basis z by U, = {w € X™: 
W1 = Z1,... ,Wn = Zn}. We denote by the symbol Fy an algebra of subsets 
of X% generated by all cylinders. 

The map 7: X” — Ze, j(w) = Lio w;2), gives one to one correspon- 
dence between X% and Z2. Thus we can identify these sets. The algebra of 
cylindric sets Fey; coincides with the algebra B(Z2) of all clopen subsets of 
Z2 (see Chapter 4). 

A function p: Foy: + Qp is a p-adic (valued) measure if the following 
properties holds true: (i) additivity: u(AU B) = u(A) + u(B), AN B = Í, 
A,B € Foy; (ii) boundedness: |||, = sup{|u(A)|p : A E Foy} < œ. As it 
has been noticed in Chapter 4, the condition of continuity (iii) is redundant 
as Fey: = B(Zz). 

A function f: X* — Qp is said to be recursive iff there is a recursive 
function g: X* x N — Q such that |f(x) — g(x,k)|p < ¢. A p-adic measure 
u: Feyi > Qp is said to be recursive iff the function fp: X* > Qp, fp(z) = 
u(Uz), is recursive. 

The uniform p-adic measure pp (p # 2) on X™ is defined by 


1 x 
Hp(Uz) = aa TE X*. (1.1) 
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If X* is realized as Z and Feyi as B(Z2), then pp is the p-adic valued Haar 
measure (translation invariant measure) on Zo. 

As |35 = 2), the additive set function uo defined by (1.1) is not 
bounded. Therefore we shall consider only the case p # 2. 

The simple considerations show that the function 


N,, (x) = inf{||U||, : £ E€ U E€ B(Z2)} = 1 


for all x € Zə. This implies that Li(Ze, 4p) = C(Z2) (because all B(Z2)- 
step functions are continuous and each continuous function can be uniformly 
approximated by a sequence of B(Z,)-step functions). This implies that the 
algebra (B(Z2)),, = {A C Zo: In € Li(Z2, up)} = B(Z2). Thus the Haar 
measure jp cannot be extended from the algebra B(Z,) to any larger algebra. 
In particular, the up cannot be extended on the Borel o-algebra generated 
by the algebra of clopen subsets B(Z»). 

Let a measure u : Foy: —> Qp be normalized: u(X*) = 1. Then we 
can consider the p-adic probability space P = (Q,F,P), where Q = X”, 
F = (Feyi),(the set algebra which is obtained via the p-extension of Fiy1), 
P = i is a p-adic probability measure. As for the p-adic uniform measure 
Mp (the Q,-valued Haar measure on Zz) the extension (Fry1),, coincides with 
Fy, and the extension jp coincides with jp, the corresponding probability 
space is P = (Q,F,P,), where Q = X”, F = Fry and P, = py. The P, 
is called a uniform p-adic probability distribution. We remark that values of 
P, on cylinders coincide with values of the standard (real-valued) uniform 
probability distribution Pœ on X®. As Q C R and Q C Qp, we can 
interpret rational numbers aay both as real and as p-adic numbers. 

In fact, we shall not use general recursive p-adic probabilities (see only 
definitions). We shall consider only the uniform p-adic probability distribu- 
tion P,, p # 2 (which is, of course, recursive). 


2 Some technical p-adic results 


The results which are obtained in this section will be used to construct p-adic 
tests and prove limit theorems for p-adic probabilities. 

For any n, k € N, (n, k) denotes the greatest common divisor of n and k; 
for any n € N, M,(n) denotes the mod p residue of n: n = Mp(n) mod p. 


We set | M,(n)| 
_ J |n—M,(n)\|p, np. 
ey(n) = { l<n<p-l. 
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Lemma 2.1. Letn,k EN, k <n and let M,(n) > M,(k). Then 


Proof. Let n = a+ip™, k = B+jp', where 0 < a, 8 < p—1, i,j,N,LEN 
and (i, p) = (j,p) = 1. We have: 


|(")| E l N). (ip™-p) | (ip™-2p) ,, , (ipN—jp'+p) , 1 
k/ Ip p P 2p (jp'—p) GP’) |p (2.1) 
= [e — -N i 
= r] = 


To obtain (2.1), we have used that n — k + 1 = ip” — jp! + (a + 1 — 8) and 
0<a+1-— g8 < p; hence the last term in the nominator of (2) = rekt, 
which is divisible by p is (ip — jp! + p). The cases in that n = a or k = £, 
0 < a, B < p-—1 are considered in the same way. E 
Lemma 2.2. Letn,k €N, k <n, and let Mp(n) +1 < M,(k). Then 


Proof. Let n = a +ip™, k = 8+ jp', where (i, p) = (j, p) = 1, 0 < a, 
B<p-—1. We have 


“—™ 
rs 
~~ 
3 
li 


pN). GN =p), (ipN=20) | (ip — pt) 
G Poe ap. Gy: 


p (2.2) 

= jp" |, =P. 
To obtain (2.2), we have used that n —k+1 = (ip — jp') — (8 — a — 1) and 
0 < 8-—a-—1 < p; hence the last term in the nominator of (7) = welacterh) 
which is divisible by p is (ip — jp'). The cases in that n = a or k = £, 
0<a, @< p-—1, are considered in the same way. a 


3 p-adic tests for randomness 


We use the following notations. For each set M C X*, we set M™ = {x € 
M :I(z) =n},n=1,2,... . For each set W C X* x N, we set Wm = {x € 
X*: (x,m) E W}. Thus WẸ? = {x € X* : I(x) = n, (2,m) € W}. 


Tests for randomness 199 


Everywhere in this chapter the cardinality of a (finite) set A is de- 
noted by the symbol o( A). We do not use the standard symbol |A|, because 
we do not want to use expressions of the form | |A| |p- 

The following definition of a p-adic test for randomness is a natural gen- 
eralization of Martin-Lof’s definition of a test for randomness for ordinary 
real probabilities (in fact, in our particular case for the uniform distribution). 

Definition 1. Let P be a p-adic recursive probability. A recursively 
enumerable (r.e.) set V C X* x N is called a p-adic P-test (p-adic test for 
randomness for the probability distribution P) if it possesses the following 
two properties: for alln,m E N, we have: 


Vm+1 C Vn, 


S PUJI E = (3.1) 


The use of p-adic tests for randomness gives the possibility to formalize 
(in fact, to create) p-adic statistics. We are given the sample space X* with 
an associated p-adic probability distribution P. Given an element x of the 
sample space, we want to test hypothesis “x is a typical outcome”. Prac- 
tically speaking, the property of being typical is the property of belonging 
to reasonable majority. To ascertain whether a given element of the sample 
space belongs to a particular reasonable majority we use the notation of a 
test. As in the ordinary probability theory, a test is given by a prescrip- 
tion that, for every level of significance € = +, tells us for what elements 
x € X* the hypothesis “x belongs to majority M in X*” should rejected 
where € = 1— P(M). The set Vm is a critical region on the significance 
level € = wae If x € Vm then the hypothesis “x belongs to majority M” is 
rejected with the significance level e. We say that x fails the test at the level 
of critical region V,,. Of course, there is a large difference between ‘p-adic 
majority’ and the ordinary ‘real majority’. Populations which are very large 
from the point of view of ordinary real probability may be very small from 
the point of view of p-adic probability and vice versa. 

We shall study only the uniform p-adic probability distribution. 
Everywhere below P = P,, p # 2. Tests for randomness for this probability 
distribution we shall simply call p-adic test. Here condition (3.1) can be 
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reformulated in the following way: 


1 
— 3.2 
PT (3.2) 
(as P(U,) = 4 for z € Vm and |2"|, = 1 for p ¥ 2, (3.1) has the form 
parc 1|, = pa). 

Proposition 3.1. Let V be a p-adic test. Then, for each (x, m) € V, we 
have 


le(vV®)], < 


l(x)(log,2) > m > 1. (3.3) 


Proof. Set n = I(x). As £ € Vp, we have Va") Æ Ø and by (3.2) o( VA”) 
is divisible by p”. Thus 2” = o( X") > oa (Vi) > p™. This implies inequality 


(3.3). = 
Proposition 3.2. Let V be a p-adic test. Then, for eachk > m, n EN, 
1 
EUPA UI E 
p p” 


Proof. As Vx” c V, we have: 
a (V) = o (VY) +o VO \ VY”). 


By the strong triangle inequality we get: lo (Va Vp < max(|o (Vp, 
lo (VE”)lp) = 1/p”. = 

As usual, we denote the integer part of a real number z by [z]. Condition 
(3.3) can be rewritten in the form 


[U(x) log, 2] > m. 


The function A(n) = [nlog, 2], n € N, will play the important role in our 
further considerations. For any p-adic test V and n € N, only sets vn), 
m =1,...,(n), can be nonempty. 

We give now a few examples of p-adic tests for randomness. All these 
tests are related to behavior of sums: 


S(z) =a1t---+an,, LEX", n=l(z). 
Example 3.1. We set 


Vm = {x € X*:0,(S(z)) >p 


™O,(i(x)), S(x) 4 0 and 
My(S(z)) < M, oe 


(I(2))}- 
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To show that the set V = {(a,m) : z € Vm} is a p-adic test, we need only to 
show that (3.2) holds true. We have: 


av) = (7), 


k 


where 0 < k < n and M,(k) < M,(n), a < at To obtain (3.2), it is 
sufficient to use the strong triangle inequality and Lemma 2.1. 


Example 3.2. We set 
Vn = fa € X*:0,(I(r)) < = and M,(S(x)) > M,(l(x)) + i} . (8.5) 


By using Lemma 2.2 we obtain that (3.2) holds true for Vm. Thus the set 
V ={(z,m): x € Vm} is a a p-adic test. 

Example 3.3. (Finite tests) Let n € N be a fixed number. Let T be 
some subset of X”, o(T) = p>. We set WK = T for m = 1,... ,A(n) 
and vi") = 0,7 > A(n), and vy = Í, k Æ n, for all j = 1,2,... . Then 
V = {(a,m) : £ € Vin}, Vm = UX, Vim is a finite p-adic test. 

To illustrate the statistical meaning of tests (3.4) and (3.5), it is useful 
to consider some subsets of them corresponding to fixed values of M,(n) and 
M,(S(z)). 

We start with test (3.4). We set 


Vin(1, 0) {x € Vm : M,(U(x)) = 1 and M,(S(x)) = 0} and 


V(i,0) = Lami Ge VO). eo) 
This test is connected with samples of the form 
T= (x1,... ,Li4jpy), I, NEN, (9,p) = 1. (3.7) 


Such a sample must be rejected with the level of significance € = e if 


1 > |S(z)|p > p™|l(z) — 1|p = p". Thus the test V(1,0) rejects all 
samples of the form x = (21,... , Zı+jpN), (j, p) = 1, in that the sum S(£) = 
T1 +: *+z£1+;pn is not divisible by a sufficiently high degree of p (but divisible 
by p`). 

A sample x of form (3.7) with S(x) = ip", (i, p) = 1,k > 1, is rejected 
with the level of significance e€ = 1/p™ if k < N— m. 
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For test (3.4) and Mp(l(x)) = 1, we can also fix M,(S(x)) = 1 and obtain 
a new test: 


Vn(1,1) = {x E€ Vm: M,(U(z)) = 1 and M,(S(z)) = 1} and 
V(1,1) = {(z,m) :z € V,,(1, 1)}. 


A sample z of the form (3.7) must be rejected with the level of significance 
e=— if 
P 
1 > |S(x) — 1p > p™ (I(x) — 1lp =p". 

Thus the test V(1, 1) rejects all samples x of the form (3.7) for that S(x)—1 
is not divisible by a sufficiently high degree of p (but divisible by p’). 

In the same way by fixing M,(n) = s = 0,...,p—1 we obtain tests 
Vin(s,9), 9=0,...,8. The V(s, q) rejects some samples of the form 


t= (z1, see Eatin), IN € N, (j, p) =1, (3.8) 


namely, samples for which S(x) — q is not divisible by a sufficiently high 
degree of p (but divisible by p'). A sample x of form (3.8) with S(r) = 
q+ip*, (i, p) =1,k > 1, is rejected with the level of significance € = 1/p” if 
k<N-m. 

We study now test (3.5). The condition M,(S(x)) > M,(I(x)) +1 > 0 
implies that this test is used to reject (with some level of significance) some 
samples for that the sum S(x) is not divisible by p (compare with (3.6)). We 
set 


Vn(0, 1) = {x E€ Vm: M,(U(x)) =0 and M,(5(z)) = 1} and 
V(0, 1) {(z,m) : x E Vi, (0, 1)}. 


By this test we reject with the level of significance £ = -+ all samples of the 
form z = (21,... ,Ljpv), (j, p) = 1, for that N < m and M,(S(z)) = 1. We 
can compare the test V(0,1) with the test V (0,0). The latter test is used to 
reject samples of the same form, but with S(x) divisible by p: S(x) = i, 
(i,p) =1,k > 1. A sample is rejected with the level of significance € = mn if 
k<N—m. 

It is possible to introduce a p-adic test O which covers all cases of divisi- 
bility by p of S(x). We start with the following simple fact: 

Proposition 3.3. Let ® and Y be two p-adic tests such that © N Y = Í. 
Then the set T = U Y is a p-adic test with critical regions Vm = Py U Vm 


on the significance level € = at 
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Proof. We need only to prove that (3.2) holds true: We have |o (1%) = 
a(R) + oR) < max(lo (DR )lp lIR )Ip) < Ae. = 

We now turn back to tests V and V defined in Examples 3.1, 3.2. It is 
evident that Vn N Vm = 0 for all m. Thus sets Em = Vm U Vm give critical 
regions (with € = m) of a p-adic test X = {(z,m): £ E Em}. 


4 Some limit theorems 


As in ordinary real probability theory tests V and V of Examples 3.1, 3.2 are 
related to some limit theorems for p-adic probability. Let P = (Q, Foa, P) be 
the probability space based on the uniform p-adic distribution P on algebra 
Fey of cylindric subsets of Q = X”, p # 2. For w € Q, we set Sa (w) = 
wi bees Hwn. 

Theorem 4.1. For each l € N the probability 


p (u € 50: 1Sa) = MCS) = 3 MaCS) < M b) > 0 


in Qp, when |n — M,(n)|p > 0, n # M,(n). 
Proof. By using considerations of Example 3.1 we obtain that 


p ( {uv E 9: IS) = MylSu(H)lo= F MaCS) < min) }) 


< p'|n — M,(n)|p- 


In particular, we obtain the following limit theorems: 
Corollary 4.1. For each l € N, the probability 
P (fw EN: S,(w) E S4 (0)}) > 0 
P 


in Qp,when |n|p — 0. 
Corollary 4.2. (see [65]) For each L€ N, the probabilities 


P (fu EN: lw) ES; 0)}) and P ({w EN: S,(w)€ 5; ()}) 


tend to zero in Qp, when |n — 1|p tends to zero. 
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Formally we can interpret Corollary 4.2 in the following way. The sum 
Sn(w) can be considered as the sum (w) = €1:(w)+- --+&,(w) of independent 
equally distributed random variables &;(w) = 0,1 with probabilities 1/2. 
By Corollary 4.2 the probability distribution of random variable Sjim(w) = 
lim,—.1 Sn(w) is concentrated at the points aj = 0 and a, of Qp. By symmetry 
reasons Ps, ({@o}) = Ps,,,,({ai}) = 1/2. Of course, this is just a formal 
statement, because Corollary 4.2 gives convergence only for spheres of Q,. 

Theorem 4.2. The probability 


P({w € N: M,(S,(w)) > M,(n) + 1}) > 0 


when |n — M,(n)|p > 0. 
As in the case of Theorem 4.1, we can, for example, put M,(n) = 0 or 
M,(n) = 1 and obtain the following consequences of Theorem 4.2: 
Corollary 4.3. The probability 


P({w EN: M,(S,(w)) > 1}) 0 


in Qp, when |n|p > 0. 
Corollary 4.4. The probability 


P({w E Q: M,(S,(w)) > 2}) + 0 


when |n — 1|, > 0. 
We note that 


P({w E 2: M,(S,(w)) > 1} = P({w ER : S,(w) E€ S)(0)}). 
Thus by Corollaries 1 and 3 we obtain that 
P({w EN: (w) € Uia (0)}) 1, 


\n|p — 0, for any m € N. Hence formally we obtain that the probability 
distribution Ps, of Stim(w) = limnoS,(w) is concentrated at the point 
ao = 0 € Qr, Psa ({0}) = 1. 

It seems that in the p-adic case it is more natural to use tests for random- 
ness than limit theorems. In the opposite to ordinary real probability theory 
in the p-adic case we have no general limit theorems for n — oo (in the sense 
of the order on N). All limit theorems give the convergence of probabilities 
for some sequences nę — 00, k — oo. For example, |ng|p —> 0, ng A 0, 
implies that ną = jp’, (j,p) = 1, N —> oo, and |n; — 1|, > 0, nk # 1, 
implies that ng = 1 + jp”, (j, p) = 1, N — œ, and so on. 
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5 Recursive enumeration of the set of p-adic 
tests 


Here we shall prove that the set of all p-adic tests is recursively enumerable. 
The general scheme of the proof is the same as in the case of real probabilities. 
However, the main part of the proof (an algorithm for constructing a p-adic 
test on the basis of a partial recursive function) strongly differs from the 
standard one (see [79]). 

We start with the following well known Lemma (see, for example, [79]). 

Lemma 5.1. There exists a partial recursive function f: Nx N —> 
X* x N with the following properties: 


(al) for all i,j E€ N such that f(i,j) # œ, we have f(i,k) # œ, for all 
k<j; 


(a2) a set AC X*xNisrie. iff A= {f(i,7) : j = 1,2,...} \ {oo}, for 
somei> 1. 


Theorem 5.1. The set of all p-adic tests is r.e. . 

Proof. Through the proof we shall use the fixed partial recursive function 
y = pi = f(i,-) given by Lemma 1. We set A, = y(N). As in the standard 
case, we shall construct for each y some total recursive function g: N —> 
X* x N such that T = A, = g(N) is a p-adic test and if ọ is a p-adic test by 
itself, then T = Ap. We construct T step by step using an algorithm which 
produces a p-adic test at each step. In the following algorithm we shall use 
sets DÉP which give approximations for sets T? in the process of building 
of T (as usual Tm = {x € X* : (£, m) € T} and T® = {z € Tm : U(x) = n}). 
We shall also use sets RẸP which are registers for collecting elements of 
(Ay). The main difference with the standard algorithm is due to the fact 
that we cannot increase sets De) at each step when y produces a value 
pj) € (Ap) = {x : ọ(j) = (x, m) for some j and I(x) = n} (because the 
p-adic metric is changed discontinuously: |x|p < ae => |z+1|,=1,m > 1). 
We collect (in R) elements of (Ay)? until o(RẸP) becomes divisible by 
p™. After this we set Dm) = RØ., 

To be sure that the result of our construction will be a r.e. set, we 
construct parallel a function g: N — X* x N such that T = g(N) and g is 
a total recursive function if T is an infinite set. 
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1 Put T= 0, DS = RY = b; put j = 0, i =0, AP =0. 

% j is the argument of y, i is the argument of g; É = o(R®). 

2 Putj=j+1 

3 If (j) = œ continual indefinitely. 

4 Find y(j) = (,m) and n = [(z). 

5 If m > [nlog, 2], then T = 9 and stop. 

6 Put R® = R® U {z} and {= 1 +1. 

7 If |e |, > gm» then go to step 2. 

8 If m > 2 and D™ Dp RW, then go to step 2. 


9 Put DP = RO. 
% We must make step 8 before step 9 to get Tm-1 D Tm- 


10 (a) Enumerate elements if De) = {Z1,... Zin) J; 
(b) for? =1,... th, put gli + l) = (a,m); 
(c) put i=i+t®. 


% The previous step is not related to the construction if T; here we 
construct the function g which gives recursive enumeration for T. 


11 Puts= m. 
12 Puts=8s+1. 
13 If s > [nlog, 2], go to 18. 


14 If tl, > 4, go to 18. 


15 If D™, p RY”, go to 18. 
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16 Put D® = RP. 


% We explain the meaning of steps 11 - 16. By step 9 the set Di” 
has been increased. Thus condition 8 must be reconsidered for sets 
D”) with s >m. It can be that occasionally some of sets R® has the 
number of elements ugk is divisible by p°. If they pass step 15, then 
we increase sets pe by 16. 


17 Repeat step 10 for m = s. 


18 Put T = T Un<s<iniog, 2] D” x {s} and go to step 2. 


We prove now that the set T which is constructed by the algorithm is a 
p-adic test. 


(A1) We use the parameter 7 to denote the p (determined by 2) of the 
algorithm. We have Ti” = = U2, D DM (7). As DRG +1) > DY (j) and 
o(D(7)) is diveibis by p™, we get that “alt )) is divisible by p” 
Thus |o(Tn”)|p < z. 


(A2) By step 8 and 15 we get that D” > De), n,m € N. Thus Tm > 
T), nmeN. 


(A3) If steps 10 and 16 are passed an infinite number of times, then g is 
the total recursive function and, hence, T = A, is r.e.. If the steps 10 
and 16 are passed only a finite number of times, then the set T is finite 
and, hence, r.e. 


We prove now that if V = A, is a p-adic test, then T = Ay. 

It is evident that T C A,. We have only to prove that V C T. It 
is sufficient to prove that, for each n, Vi x {m} c T x {m} for all 
m< jn log, 2]. 

For each n, the set V = {(2,m) € V : U(x) = n} is finite (since m < 
[n log, 2]). Thus y produces all elements of V) after a finite numbers of steps 
J = J(n,y).? Let vp(J) = (z3, my) (here (zz) = n and my < [nlog, 2]). 
We have: D”) D DY DEED. p™ and |o(D”) |, < z for s = 1,...,M = 
[n log, 2]. We also have: R® = V (because V c p({1,2,...,J}) and 


2Of course, some points (x, m) € V™ can appear again on some steps J’ > J. 


208 Chapter 6 


p({1,2,... , J})) = RS”). Thus, for all s, |o(R”)|, < ce In particular, 
this holds for s = m;. Hence, for m = mj, step 7 is passed. 

We prove that D® = R® = = VČ for all s = 1,...,my—1. As 
lo(R” Mes z , R™ has passed step 7. But step 8 is civil for m = 1. Thus 
by step 9 we on DW”) = ie = V, For s = 2, we have |o( R® |, <4 z and 
step 7 is passed. As D® = V® and R® = Vg”, we have D® > R® and 
step 8 is passed. By step 9 we get Di”) = R® = vi. We can repeat such 
ponderat ions unen : takes Tahia my — 1. 

As Din n) = vir HD Vier = R®), step 8 is passed for m = mj and we 
get Dm”) = — RO = a Thus we arrive to step 11 with m = my. For all 
mzy <s < M = [nlog, 2], step 14 is passed automatically. For s = my +1 
we have D® = Vi") > Vins va= RO 41 Hence step 15 is passed and we put 
Dp” 41 = RY 4 = = yin my Ta Repeating these considerations, we prove that 
Ds” = R® = yV® for all s = my,... , M. Hence V® = T”. E 


6 No p-adic universal test 


A natural generalization of the definition of a universal test for randomness 
is the following one: 

Definition 6.1. A p-adic test U is said to be universal if for every p-adic 
test V we can effectively find c € N (depending upon U and V) such that 
Vinte C Um for all m. 

It is well known that in the ordinary real probability theory there exists 
a universal test for randomness (which is, of course, not unique). We shall 
show that in p-adic probability theory there is no universal recursive tests. 
We start with some technical considerations. We have to study more carefully 
properties of the function A(n) = [n log, 2]. As p > 2, we have log, 2 < 1. 


We set Lk = fea IfO0<n< L, then nlog,2 < 1 and A(n) = 0; in the 
Bp 
same way we have: if Lp-1 < n < Lp, then A(n) = k— 1, k > 2. We set 
np = Ly +1. 
Lemma 6.1. The inequality 
pr(re) > gre-t (6.1) 


holds true for all k =1,2,... . 
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Proof. We have A(n) = k and XA(n, — 1) = k — 1. By definition 
A(n) = max{l : p < 2"). Hence, for all n, p+! > 2". In particular, 
prre-Y+1 = pë > 2-1. Hence p“ = pòr) > i ae | 

We construct now two p-adic tests W and W by using the following 
procedure. 

For k = 1,2,... and j =1,... , A(ng), we set 


wir) = vias SY pipes , Lyring) F 


and — E 

wir = Winn) = {Lan pAn gois , Tony } 
and W;"*) = W{"*) = Ẹ for n # n and 1 =1,2,... . Here we have used the 
lexicographic enumeration of elements of X”*, k =1,2,...: %1,%o,... , Lame. 


Since o(Wi") = o(W\) = pr, j = 1,2... ,A(n), by (6.1) we obtain 
wi) nN wi) # and hence 


X™ = Wi) UWP"), 7 =1,... Alms). 


Theorem 6.1. A universal p-adic test does not exist. 

Proof. Let us suppose that there exists a universal p-adic test U. Thus 
we can effectively find c1, c2 E€ N such that Wm+ce, C Um and Win+e, C Um, 
where W and W are p-adic tests constructed before this theorem. Let k be 
so large that A(n) — cı > 1 and A(n,) — c2 > 1. Thus wie = WE») 


Ang)? 
wire = Wy: Hence une D wine U wine) = X™. This implies that 
lo(U("*)|, = |o(X")|, = 1. This contradicts to (3.2). , 


7 Randomness of infinite sequences 


Let V C X* x N be a p-adic P-test, where P is an arbitrary p-adic recursive 
probability. We set 


Om = U{Uy : y E Vm} C X” and O = {(w, m) :w E€ Om} C X® XN. (7.1) 


If the set Vm is infinite, then in general Om does not belong to the set 
algebra Feyi. Therefore probability P(O,,) may be not defined. 

Thus we could not generalize the standard condition for real probabilities 
(namely P(Om) < ma) to define a p-adic sequential test. It seems that the 
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only possibility to define a p-adic sequential test is to use all tests O obtained 
via (7.1) from p-adic P-tests V C X* x N. 

Definition 7.1. Let P: Fy, > Qp be a recursive probability and let 
Vc X* xN be a p-adic P-test. The set O defined on the basis of V via 
(7.1) is said to be a p-adic sequential P-test. 

Definition 7.2. Let O be a p-adic sequential test. A sequence w is said 
to be P-random with respect to the test O if 


w g Ooo = N Om. 
m=1 


In general, the set O.. Z Fey and P cannot be extended on the o-algebra 
B(X~) containing O.. Therefore in general P(O) is not defined. 

A sequence w € O,, is considered as non-random with respect to P. 

As usual, we restrict our considerations to the case of the uniform p-adic 
distribution P = P,, p £ 2. 

Example 7.1. Let O be the p-adic sequential test based on the p-adic 
test V of Example 3.1. We consider a few examples of sequences w € X° 
which are random or non-random with respect to O: 

(1) Let only a finite number k > 1 of coordinates of w = (wj) be equal 
to 1: wji = +++ = Wj, = 1. We show that w is not non-random with respect 
to O. We have to show that for each m there exist n such that wen € Vr) 
where win = (W1,.-.,Wn). Let k = B = 1,...,p— 1. Wesetn=8+p", 
where N > m, N > log,(j, — 8). Then ©,(n)/O,(k) = p~ < p™ and 
win E€ VA"). Let k = B+ jp,9,L EN, Gp) = 1. We set n = 8 +p, 
where N > m+l, N > log,(j, — 8). Then ©,(n)/O,(k) = p7" < p-™ and 
Wyn E Vi). 

(2) The sequence w = (0,... ,0,...) is random with respect to O, because 
w ZO; (win Z V® for all n); the sequence w = (1,...,1,...) is random 
with respect to O, because k = S (win) = n and ©,(n)/O,(k) = 1 and hence 
Win Z vy for all n. 

(3) Here we present an example of a random sequence w € X® with 
respect to O which contains the infinite number both of zeros and ones. Any 
w E€ X” can be represented as a sequence of blocks w = bb2...bm..., 
where l(b) = p”. Let S(b) = p and S(b;) = p — pt, j > 1. Set 
z= b;...bm,m > 1. Then z € Vi”). Here I(x) = p™ and S(x) = p + (P? — 
p) +--+ (p” —p™) = p", thus: Op(I(2))/@p(S(a)) = p”. 


Tests for randomness 211 


7 Example 7.2. Let O be a p-adic sequential test based on the p-adic test 
V of Example 3.2. 


(1) We consider the same sequence w as in (1) of Example 7.1. 


(a) Let k = Bork = B+jp, (j, p) £ 1 and 8 = 1,... ,p—1. We show that 
such an w € X© is non-random with respect to O. Let n = (G—1) +p, 
where N > m and N > log,(j, — G+ 1). Then S(win) = k and hence 
M,(S(win)) = 8 > M,(n) +1 and ©,(n) = p ™ < p™. Thus win € vo 
and w € Om- 


(b) Let k = jp’, (j,p) #1 - We: show that if p” > jp, then win ¢ Ve, 
n > 1 (the condition ©,(l (x) < p™ implies that I(x) > p”; but, for win 
with n > p™, we have S (wr, n) = k and, as M,(S(w1.,)) = 0, there is no n 
such that M,(S(w1n)) > M,(n) +1). Thus in the opposite to the test O 
any sequence w € X@ in that only a finite number k = g, t = 1,2,..., of 
coordinates are equal to 1 is considered as random with respect to O. 


(2) The sequence w = (0,...,0,,...) is random with respect to O (be- 
cause, for all x = (0,... ,0), 0 = M,(S(x)) < M,(I(z)) + 1; the sequence 
w = (1,...,1,...) is also random with respect to O (because, for all g = 
(1,...,1), Mp(S(#)) = Mp(U(2))). 

(3) We consider the same sequence as in (3) of Example 7.1. We show 
that some of such sequences are random with respect to O and some are 
non-random. Let w = bıb2...bm... and in each block b; the first p — 
p)* elements are equal 1 and w1 = --- = wp = 1 in b, (other elements 
in each block are equal to 0). If, for win, Op(n) < re then M,(n) = 0 


and, hence, M,(S(win)) = 0. Thus win E ae Let w = biba... bm 

and the distribution of ones in blocks have the following structure. For 

by = (@,...,%p2), Tı = +) = p1 = 1, zp = 0, £p = 1; for bj = 

(z1, soe , Tpi), Ti St = Lpi-pi-1-1 F 1, Tpi—pi-1 = 0, Tpi+pi-1+1 = 1. Then 

Wi:p € Vi) (since M,(S(wiy)) =p—1>1+M,(p) = 1); win € Vo for 

n =p” +p) — pi! (since Swin) = p — pt — 1 implies M,(S(win)) = 1). 
As consequence of Theorem 5.1 we obtain the following theorem: 
Theorem. 7.1. The set of all p-adic sequential tests is r.e.. 


A p-adic sequential test D is said to be universal if, for every p-adic 
sequential test O, we can effectively find c € N (depending upon D and O) 
such that Om+e C Dm for all m. 
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At first sight, it seems to be natural to consider the set 


On =|] O9, (7.2) 


i=l 


where Oy), i = 1,2,..., is a recursive enumeration of p-adic sequential tests, 
as the maximal set of p-adic non-random sequences (with respect to the 
p-adic uniform distribution) and call a sequence w € X” \ O™* a p-adic 
random sequence. However, Theorem 6.1 (nonexistence of universal p-adic 
test) is the sign that such a procedure could not be successful. 
Proposition 7.1. The set O%* defined as (7.2) is equal to X™. 
Proof. Let W and W be p-adic tests defined in section 6 and let O and O 
be the corresponding p-adic statistical tests. We have that Wi” uw”) = X’, 
j =1,...,A(n), for all n. As A(n) > œ, n > œ, then Yl, m € N AN = 


N(l,m): A(N) > l,m. As we also have that wy” = Wa and Ww”) = We, 


j =1,...,A(n), then, for N = N(l, m), we obtain wi”) UWM = XN, We 
also have: 


O% U Ooo = (N2101) U (OL 10m) = MP, NL (Or U Om): 
Finally we show that O, U Öm = X@ for every l and m: 
Or U Om D UfUy : 2 E WM UW} = X9, 


where N = N(l, m). u 

Thus in the opposite the real case the existence of the recursive enumer- 
ation of the set of all p-adic sequential tests does not imply the possibility of 
the fruitful development of the theory of randomness based on the maximal 
constructive set of non-random sequences. In some sense the situation here is 
similar to the ordinary (real) nonconstructive probability theory where any 
Bc X”, P(B) = 1, may be considered as a ‘law of randomness’ (thus the 
maximal set of non-radnom sequences coincides with X°°). 

Definition 7.2. A p-adic sequential test O is said to be a universal if 
for every p-adic sequential test O we can effectively find c € N such that 
Om+e C Om for all m. 

Lemma 7.1. Let nj, j = 1,2,..., be numbers associated with the func- 
tion A. Then, for mj = A(n;), 


N; = (2 — pmm es pt j=1,2,.... 
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Proof. By (6.1) we obtain: p™2~% > 1/2. Thus (1 — p™27") < 1/2 


and hence 
Mjr 


N; = (1 — p™ 2) < 
But by (6.1) we also have p™i+: > 72" = 

Proposition 7.2. The trivial p- adie sequential test O with Om = X”, 
m > 1, is the (unique) universal p-adic sequential test. 

Proof. We prove that the © with Om = X% for all m > 1 is a p-adic 
sequential test: there exists a p-adic test V C X* x N such that V induces 
O. Let n; be natural numbers associated with À and let m; = A(n;). 

We represent X" = Age U BY), Ag? N BE) = and o( AR?) = 
p™) (and, consequently, o( BS?) = 2") — p™), where the sets Alt) are 
constructed by the following procedure. We set Ar) = = {£1,... , Tpm }, 


(ay) 


where X™ = {2,... ,£pm,... , Loni}. Suppose that the set Am? has been 


constructed. We set 
Cra = {x € X™ : x has a prefix y belonging to the set Byn =X "A AG} 


— BGI) x XPT 


The set An i is the union of the set Cr), and p™+1-—o (Cri) = p™+1—N; 


(where N; = (2% — p™)2"+1-"7) first elements of X"+! = (a),... , Lonj+1) 
which do not belong to Crip. By (7.2) p+ — N; > 0 for all j > 1. Thus 


this procedure define sets AM ni) for all j >l. 

We set Vin, = U2, Amn? and Vm = Vm, for Mpk-1 < M < mg. We prove 
that V = {(x,m) : x € Vm} is a p-adic test. The set V is r.e. and Vm D Vin4i 
by the procedure of construction. We also have: yp) = ) Ai ) j 2 k, and 

Ve) =O, n#nj,j >k. Thus 
lo (VR? Ip = lo (AR?) |p = p Sp 
On the other hand, for each m,, we have: 
(U{Uz: £ € Ale) }) U (U{U, : x € Agent) 
D (UUs : x E AGH }) U (Ufa : y € BPE, a © XP M}) 


= (U{U, : z € AQP }) U(U{Uy: y € Bok }) = 
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= 

The previous result imply that in the p-adic case (similar to Schnorr’s 
theory of randomness [98]) the only reasonable approach to randomness of 
infinite sequences is to use randomness with respect to the concrete p-adic 
sequential test O. Of course, the use of O-randomness has extremely different 
origins in our theory and Schnorr’s theory. It seems that in the p-adic case 
this situation is a consequence of the impossibility to define o-additive (non- 
discrete) probability on the o-algebra generated by F.... Thus we have no 
other possibility than to identify a sequential tests O with tests V C X* x N. 
In Schnorr’s theory this situation is a consequence of the use of total recursive 
null-sets. 
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